Backup and Restore

GraphDB supports backing up and restoring an instance through its Recovery REST API. Both partial (per repository) and full recovery procedures are available with optional inclusion of user account data.

Note

As with all operations that involve a REST API, in order to perform a backup or a restore procedure, the respective GraphDB instance must be online.

The functionality can also be accessed from the Workbench UI under Help ‣ REST API ‣ Recovery management.

Planning a Backup

Whether you want to be able to quickly recover your data in case of failure or perform routine admin operations such as upgrading a GraphDB instance, it is important to prepare an optimal backup & restore procedure.

There are various factors to take into consideration when designing a backup strategy, such as:

  • Optimal timing for downtime tolerance (for applying backup) and read-only tolerance (for creating a backup)

  • Available system resources and specifically ensuring enough disk space for backup

  • Scope of the backed-up data (e.g., full or per-repository backup, or whether user accounts and settings are included)

  • Frequency of backup creation

All created backups are stored in the /data/recovery directory of the GraphDB instance.

Warning

During backup creation, the GraphDB instance on which the snapshot is created goes into read-only mode. If the GraphDB instance is run in a cluster and is currently the leader, the cluster will enter read-only mode until the process is completed.

Creating a Backup

As mentioned, backups can be either covering all repositories (full backup) or only selected existing repositories (partial backup), and they may also include the user accounts and settings.

A GraphDB instance can be backed up using the /rest/recovery/backups/{name} endpoint. To create a backup, simply POST an HTTP request with the backup name input as a path parameter.

Note

Creating a backup requires the administrator role.

Full backup

Here is an example cURL request for full instance backup creation with system data (i.e., user accounts and settings):

curl -X POST '<base_url>/rest/recovery/backups/<backup-name>'

Partial backup

Here is an example cURL request for partial backup creation not including system data:

curl -X POST -H 'Content-Type: application/json' -d '{
  "repositories": [
     "<repository-name>"
  ]
}' '<base_url>/rest/recovery/backups/<backup-name>?withSystemData=false'

Note

If a POST request does not include a list of repositories for backup, it will automatically create a full backup.

Backup options

Backup creation can be further configured via the following options:

Option

Description

withSystemData

Determines whether user account data such as user accounts, saved queries, visual graphs etc. should be included in the backup. Specified as a request parameter in the URL. Boolean, the default value is true.

repositories

List of repositories to be included in the backup. Specified as JSON in the request body.
If no repositories are provided, all repositories will be included in the backup.

Restoring from a Backup

A GraphDB instance can be restored to a backed-up state through the /rest/recovery/restore/{name} endpoint. To restore a backup, simply POST an HTTP request with the backup name input as a path parameter.

Note

Restoring a backup requires the administrator role.

Full restore

If we have successfully created a backup and want to completely revert to the backed-up state, we can use the following cURL request example:

curl -X POST -H 'Content-Type: application/json' '<base_url>/rest/recovery/restore/<backup-name>?cleanDataDir=true'

Which does the following:

  • All existing repositories are removed because the cleanDataDir option is set to true.

  • All repositories that are in the backup are restored.

  • User accounts and settings are replaced with those from the backup because withSystemData was not provided and the default value is true.

Note

Keep in mind that the example above specifies that system data should also be reverted. This would cause all user-related changes to go to their previous state. If the backup is missing system data, the recovery procedure will not be successful.

Partial restore

If we have successfully created a backup and want to partially revert the state of certain repositories, we can use the following cURL request example:

curl -X POST -H 'Content-Type: application/json' -d '{
   "repositories": [
     "<repository-name>"
   ]
 }' '<base_url>/rest/recovery/restore/<backup-name>?cleanDataDir=false&withSystemData=false'

Which does the following:

  • Existing repositories are kept because of cleanDataDir=false.

  • The provided repositories are restored from the backup.

  • User accounts and settings are kept because withSystemData is set to false.

Restore options

Applying a backup can be further configured via the following options:

Option

Description

cleanDataDir

If set to true, GraphDB will remove all existing repositories before applying the backup. Specified as a request parameters in the URL. Boolean, the default is false.

withSystemData

Determines whether GraphDB should restore user account data such as user accounts, saved queries, visual graphs etc. from a backup or continue with the their current state. Specified as a request parameter in the URL. Boolean, the default value is true.

repositories

List of repositories to recover from the backup. Specified as JSON in the body.
If not provided, all repositories that are in the backup will be restored.

Restoring a Cluster

The purpose of the restore procedure is to recover a failed node or recover to a given state of repositories. This is why in a cluster, the restore should be done on a node that is detached from the cluster.

To recover a cluster to a given desired state, proceed as follows:

  1. Delete the cluster.

  2. (optional) If the backup is stored on only one node, copy the backup .tar file located in the node’s /data/recovery directory in the respective /data/recovery directory of each cluster node.

  3. Restore each node from the backup.

  4. Recreate the cluster group.

Warning

We do not recommend restoring nodes that are part of a cluster group. Attempting this might result in a Byzantine behavior as the node might recover to a state with a different number of repositories or a different transaction log, which would lead to cluster protocol failure.

Managing Backups

We can view all existing backups through a simple GET request to the /rest/recovery/backups endpoint.

To list all available backups for the given GraphDB instance using cURL, execute:

curl '<base_url>/rest/recovery/backups'

Note

As backups require storage, it is a good practice to remove outdated backups. This can be done through the /rest/recovery/backups/{name} endpoint.

To remove a backup using a cURL request, execute:

curl -X DELETE '<base_url>/rest/recovery/backups/<backup-name>'