Backing up and recovering a repository

What’s in this document?

Backing up a repository

Several options are available:

Option 1: Using the GraphDB Workbench

Note

Best used for a small running system.

Export the database contents using the Workbench. To preserve the contexts (named graph) when exporting/importing the whole database, use a context-aware RDF file format, e.g., TriG.

  1. Go to Explore/Graphs overview.
  2. Choose the files you want to export.
  3. Click Export graph as TriG.
_images/export_TriG.png

Option 2: Exporting the data of each repository

Note

Works without stopping GraphDB but it is very slow.

  1. Export the data of each repository, while the database is running.

    Note

    All updates executed after the EXPORT had been started will not be put in the exported data (due to the READ COMMITTED transaction isolation in GraphDB).

  2. Shutdown the database (stop Tomcat) and delete the older GraphDB application(s) – .war files and the expanded folder.

Option 3: Using the graph store protocol and curl

This can be achieved on the command line in a single step using the graph store protocol (change the repository URL and name of the export file accordingly).

curl -X GET -H "Accept:application/x-trig" "http://localhost:7200/repositories/test/statements?infer=false"
This method streams a snapshot of the database’s explicit statements into the export.trig file.

Option 4: Programmatically using the RDF4J API.

Use the RepositoryConnection.exportStatements() method with the includeInferred flag set to false (in order not to serialise the inferred statements).

Example:

RepositoryConnection connection = repository.getConnection();
FileOutputStream outputStream = new FileOutputStream(new File("/tmp/test.txt"));
RDFWriter writer = Rio.createWriter(RDFFormat.NTRIPLES, outputStream);
connection.exportStatements(null, null, null, false, writer);
IOUtils.closeQuietly(outputStream);

Use the RepositoryConnection.getStatements() method with the includeInferred flag set to false (in order not to serialise the inferred statements).

Example:

java.io.OutputStream out = ...;
RDFWriter writer = Rio.createWriter(RDFFormat.NTRIPLES, out);
writer.startRDF();
RepositoryResult<Statement> statements =
repositoryConnection.getStatements(null, null, null, false);
while (statements.hasNext()) {
    writer.handleStatement(statements.next());
}
statements.close();
writer.endRDF();
out.flush();

The returned iterator can be used to visit every explicit statement in the repository and one of the RDF4J RDF writer implementations can be used to output the statements in the chosen format. If the data will be re-imported, we recommend the N-Triples format as it can easily be broken into large ‘chunks’ that can be inserted and committed separately.

Option 5: Copying GraphDB storage folders

Note

It is very fast but requires stopping GraphDB.

  1. Stop GraphDB/Tomcat.

  2. Manually copy the storage folders to the backup location.

    kill <pid-of-graphdb>
    sleep 10 #wait some time for the graphdb to stop
    cp -r {your data directory}/repositories/your-repo ~/your-backups/TODAY-DATE/
    

Tip

For more information about data directory, see here.

Restoring a repository

Several options are available:

Option 1: Importing data with preserved contexts in RDF4J Workbench

Note

Best used for a small running system.

  1. Go to Add.
  2. Choose Data format: TriG.
  3. Choose RDF Data File: e.g., export.trig.
  4. Clear the context text field (it will have been set to the URL of the file). If this is not cleared, all the imported RDF statements will be given a context of file://export.trig or similar.
  5. Upload.
You can also use the TriX format (an XML-based context-aware RDF serialisation).

Option 2: Importing data with preserved contexts in GraphDB Workbench

Note

Best used for a small running system.

See Loading data.

Option 3: Replacing the GraphDB storage directory (and any subdirectories)

Note

If it is possible to shut down the repository.

  1. Replace the entire contents of the storage directory (and any subdirectories) with the backup.
  2. Restart the repository.
  3. Check the log file to ensure a successful start up.