Backing up and Restoring a Repository

Back up a repository

Repository backups allow you to revert a GraphDB repository to a previous state. The database offers different approaches of copying the repository state:

  • Export the repository to an RDF file - this operation can run in parallel to read and write, but takes longer to complete.

  • Back up a repository using the JMX interface or cURL.

  • Copy the repository image directory to a backup - this is a much faster option, but in non-cluster setups it requires shutdown of the database process.

Note

We recommend all repository backups to be scheduled during periods of lower user activity.

Export repository to an RDF file

The repository export works without having to stop GraphDB. This operation usually takes longer than copying the low level file system, because all explicit RDF statements must be serialized and de-serialized over HTTP. Once the export operation starts, all following updates will not be included in the dump. To invoke the export repository operation, several interfaces are available:

Option 1: Export the repository with the GraphDB Workbench.

Export the database contents using the Workbench. To preserve the contexts (named graph) when exporting/importing the whole database, use a context-aware RDF file format, e.g., TriG.

  1. Go to Explore/Graphs overview.

  2. Choose the files you want to export.

  3. Click Export graph as TriG.

_images/export_TriG.png

Option 2: Export all statements with cURL.

The repository SPARQL endpoint supports dumping all explicit statements (replace the repositoryId with a valid repository name) with:

curl -X GET -H "Accept:application/x-trig" "http://localhost:7200/repositories/repositoryId/statements?infer=false" > export.trig

This method streams a snapshot of the database’s explicit statements into the export.trig file.

Option 3: Export all statements using the RDF4J API.

The same operation can be executed once with Java code by calling the RepositoryConnection.exportStatements() method with the includeInferred flag set to false (to return only the explicit statements).

Example:

RepositoryConnection connection = repository.getConnection();
FileOutputStream outputStream = new FileOutputStream(new File("export.nq"));
RDFWriter writer = Rio.createWriter(RDFFormat.NQUADS, outputStream);
connection.exportStatements(null, null, null, false, writer);
IOUtils.closeQuietly(outputStream);

The returned iterator can be used to visit every explicit statement in the repository. One of the RDF4J RDF writer implementations can be used to output the statements in the chosen format.

Note

If the data will be re-imported, we recommend the N-Quads format as it can easily be broken down into large ‘chunks’ that can be inserted and committed separately.

Back up a repository using JMX interface

GraphDB offers backing up a repository through JMX.

Use the OwlimRepositoryManager MBean method createZipBackup(String backupName) with a backupName argument. This will create a zip file named rep_<repository_id>-<timestamp>_backup.zip with the content of the repository data directory (the storage folder is a subfolder there, so the config.ttl should be archived too). By default, it will be created in the backup/backupName directory of the repository’s folder.
You can also change the location of the backup directory by using the runtime property -Dgraphdb.backup.base.folder=<full_path_to_target_folder>. This will result in creating backup in the <full_path_to_target_folder/backup/backupName> folder.
Invoking method with null parameter for backupName will result in creating backup in the default folder.

Note

Any attempt to create backup with an invalid backupName will result in the following message:

“Backup name must start with a letter, digit, or underscore.
Each subsequent character may be a letter, digit, underscore, dash, or period.”

You can invoke the method from JConsole, or by sending an HTTP request via cURL:

Back up a repository from JConsole

Invoke backup from the JMX interface using JConsole:

_images/backup_jconsole.png

Back up a repository using cURL

curl -H 'content-type: application/json' -d "{\"type\":\"exec\",\"mbean\":\"com.ontotext:type=OwlimRepositoryManager,name=\\\"Repository (/full_path_to_repository_storage/)\\\"\",\"operation\":\"createZipBackup\",\"arguments\":[\"backupName\"]}" http://localhost:7200/jolokia/

Here is an example where full_path_to_repository_storage is replaced by a real path:

curl -H 'content-type: application/json' -d "{\"type\":\"exec\",\"mbean\":\"com.ontotext:type=OwlimRepositoryManager,name=\\\"Repository (/home/ubuntu/graphdb-se-8.7.0/data/repositories/test/storage/)\\\"\",\"operation\":\"createZipBackup\",\"arguments\":[\"backupName\"]}" http://localhost:7200/jolokia/

This will produce a folder backupName in the <test>/backup/ directory which contains the backup zip.

Back up GraphDB by copying the binary image

Note

This is the fastest method to back up a repository, but it requires stopping the database.

All RDF data is stored only in your repository.

  1. Stop the GraphDB server.

  2. Manually copy the storage folders to the backup location.

    kill <pid-of-graphdb>
    sleep 10 #wait some time for the database to stop
    cp -r {graphdb.home.data}/repositories/your-repo backup-dest/date/ #copies GraphDB's data
    

Tip

For more information about the data directory, see here.

Restore a repository

The restore options depend on the backup format.

Option 1: Restore a repository from an RDF export.

This option will import a previously exported file into an empty repository.

  1. Make sure that the repository is empty or recreated with the same repository configuration settings.

  2. Go to Import ‣ RDF, and select the Server files tab.

  3. Press the Help button to see the directory path where you need to import your files or directories.

  4. Copy the RDF file with the backup into this directory path and refresh the Workbench.

  5. Start the file import and wait for the data to be imported.

Option 2: Restore the database from a binary image or zip backup.

  1. Make sure that the repository is empty or recreated with the same repository configuration settings.

  2. Stop the GraphDB server.

  3. Delete the entire your-repo folder, and copy/paste the folder of the graphdb.home.data/repositories/your-repo from the backup copy.

  4. Start the GraphDB server.

  5. Run a quick test read query to make sure that the repository is initialized correctly.