Backing up and restoring a repository

Backup a repository

Repository backups allow users to revert a GraphDB repository to a previous state. The database offers different approaches of copying the repository state.

  • Export the repository to an RDF file - this operation can run in parallel to read and write, but it takes more time to complete.
  • Backup of a repository using the JMX interface or curl
  • Copy the repository image directory to a backup - this is a much faster option, but in non-cluster setups it requires to shutdown the database process.

Note

We recommend all repository backups to be scheduled during periods of lower user activities.

Export repository to an RDF file

The repository export works without having to stop GraphDB. This operation usually takes longer than copying the low level file system, because all explicit RDF statements must be serialized and deserialized over HTTP. Once the export operation starts, all following updates will not be included in the dump. To invoke the export repository operation several interfaces are available:

Option 1: Export the repository with the GraphDB Workbench.

Export the database contents using the Workbench. To preserve the contexts (named graph) when exporting/importing the whole database, use a context-aware RDF file format, e.g., TriG.

  1. Go to Explore/Graphs overview.
  2. Choose the files you want to export.
  3. Click Export graph as TriG.
_images/export_TriG.png

Option 2: Export all statements with curl.

The repository SPARQL endpoint supports dumping all explicit statements (replace the repositoryId with a valid repository name) with:

curl -X GET -H "Accept:application/x-trig" "http://localhost:7200/repositories/repositoryId/statements?infer=false" > export.trig

This method streams a snapshot of the database’s explicit statements into the export.trig file.

Option 3: Export all statements using the RDF4J API.

The same operation can be executed once with Java code by calling the RepositoryConnection.exportStatements() method with the includeInferred flag set to false (to return only the explicit statements).

Example:

RepositoryConnection connection = repository.getConnection();
FileOutputStream outputStream = new FileOutputStream(new File("export.nq"));
RDFWriter writer = Rio.createWriter(RDFFormat.NQUADS, outputStream);
connection.exportStatements(null, null, null, false, writer);
IOUtils.closeQuietly(outputStream);

The returned iterator can be used to visit every explicit statement in the repository and one of the RDF4J RDF writer implementations can be used to output the statements in the chosen format.

Note

If the data will be re-imported, we recommend the N-quads format as it can easily be broken into large ‘chunks’ that can be inserted and committed separately.

Backup a repository using JMX interface

GraphDB offers backing up a repository through JMX.

OwlimRepositoryManager MBean is extended with createZipBackup(String folder) method. Invocation with a folder argument creates a zip file named directory_<repository_id>-<timestamp>_backup.zip with the content of the repository data dir (storage folder is subfolder there so the config.ttl should be archived too). If you add just a name of the directory it will be created in the /bin folder.

You can invoke the method from jconsole or sending http request by curl:

Backup a repository from jconsole

Invoke backup from JMX interface, using jconsole - please see the example:

_images/backup.png

Backup a repository using curl

curl -H 'content-type: application/json' -d "{\"type\":\"exec\",\"mbean\":\"com.ontotext:type=OwlimRepositoryManager,name=\\\"Repository (/full_path_to_repository_storage/)\\\"\",\"operation\":\"createZipBackup\",\"arguments\":[\"backupName\"]}" http://localhost:7200/jolokia/

Here is an example where full_path_to_repository_storage is replaced by real path:

curl -H 'content-type: application/json' -d "{\"type\":\"exec\",\"mbean\":\"com.ontotext:type=OwlimRepositoryManager,name=\\\"Repository (/home/ubuntu/graphdb-se-8.7.0/data/repositories/test/storage/)\\\"\",\"operation\":\"createZipBackup\",\"arguments\":[\"backupName\"]}" http://localhost:7200/jolokia/

This will produce a folder backupName in /bin directory which contains the backup zip.

Backup GraphDB by copying the binary image

Note

This is the fastest method to backup a repository, but it requires stopping the database.

All RDF data is stored only in your repository.

  1. Stop the GraphDB server.

  2. Manually copy the storage folders to the backup location.

    kill <pid-of-graphdb>
    sleep 10 #wait some time the database to stop
    cp -r {graphdb.home.data}/repositories/your-repo backup-dest/date/ #copies GraphDB's data
    

Tip

For more information about the data directory, see here.

Restore a repository

The restore options depends on the backup format.

Option 1: Restore a repository from an RDF export.

This option will import a previously exported file into an empty repository.

  1. Make sure that the repository is empty or recreated with the same repository configuration settings.
  2. Go to Import > RDF and then select the Server files tab.
  3. Check on the web page what is the directory path after the string Put files that you want to import in.
  4. Copy the RDF file with the backup into this directory path and refresh the page.
  5. Start the file import and wait for the data to be imported.

Option 2: Restore the database from a binary image or zip backup.

  1. Stop the GraphDB server.
  2. Delete entire your-repo folder and Copy/Paste the folder of the {graphdb.home.data}/repositories/your-repo from the backup copy.
  3. Start the GraphDB server.
  4. Run a quick test read query to check that the repository is initialized correctly.