Backup and Restore¶
What’s in this document?
GraphDB supports backing up and restoring an instance through its Recovery REST API. Both partial (per repository) and full recovery procedures are available with optional inclusion of user account data.
Note
As with all operations that involve a REST API, in order to perform a backup or a restore procedure, the respective GraphDB instance must be online.
The functionality can also be accessed from the Workbench UI under
.Planning a Backup¶
Whether you want to be able to quickly recover your data in case of failure or perform routine admin operations such as upgrading a GraphDB instance, it is important to prepare an optimal backup & restore procedure.
There are various factors to take into consideration when designing a backup strategy, such as:
Optimal timing for downtime tolerance (for applying backup) and read-only tolerance (for creating a backup)
Available system resources and specifically ensuring enough disk space for backup
Scope of the backed-up data (e.g., full or per-repository backup, or whether user accounts and settings are included)
Frequency of backup creation
All created backups are stored in the /data/recovery
directory of the GraphDB instance.
Warning
During backup creation, the GraphDB instance on which the snapshot is created goes into read-only mode. If the GraphDB instance is run in a cluster and is currently the leader, the cluster will enter read-only mode until the process is completed.
Creating a Backup¶
As mentioned, backups can be either covering all repositories (full backup) or only selected existing repositories (partial backup), and they may also include the user accounts and settings.
A GraphDB instance can be backed up using the /rest/recovery/backups/{name}
endpoint. To create a backup, simply POST an HTTP request with the backup name input as a path parameter.
Note
Creating a backup requires the administrator role.
Full backup¶
Here is an example cURL request for full instance backup creation with system data (i.e., user accounts and settings):
curl -X POST '<base_url>/rest/recovery/backups/<backup-name>'
Partial backup¶
Here is an example cURL request for partial backup creation not including system data:
curl -X POST -H 'Content-Type: application/json' -d '{
"repositories": [
"<repository-name>"
]
}' '<base_url>/rest/recovery/backups/<backup-name>?withSystemData=false'
Note
If a POST request does not include a list of repositories for backup, it will automatically create a full backup.
Backup options¶
Backup creation can be further configured via the following options:
Option |
Description |
---|---|
|
Determines whether user account data such as user accounts, saved queries, visual graphs etc. should be included in the backup. Specified as a request parameter in the URL. Boolean, the default value is |
|
List of repositories to be included in the backup. Specified as JSON in the request body.
If no repositories are provided, all repositories will be included in the backup.
|
Restoring from a Backup¶
A GraphDB instance can be restored to a backed-up state through the /rest/recovery/restore/{name}
endpoint. To restore a backup, simply POST an HTTP request with the backup name input as a path parameter.
Note
Restoring a backup requires the administrator role.
Full restore¶
If we have successfully created a backup and want to completely revert to the backed-up state, we can use the following cURL request example:
curl -X POST -H 'Content-Type: application/json' '<base_url>/rest/recovery/restore/<backup-name>?cleanDataDir=true'
Which does the following:
All existing repositories are removed because the
cleanDataDir
option is set totrue
.All repositories that are in the backup are restored.
User accounts and settings are replaced with those from the backup because
withSystemData
was not provided and the default value istrue
.
Note
Keep in mind that the example above specifies that system data should also be reverted. This would cause all user-related changes to go to their previous state. If the backup is missing system data, the recovery procedure will not be successful.
Partial restore¶
If we have successfully created a backup and want to partially revert the state of certain repositories, we can use the following cURL request example:
curl -X POST -H 'Content-Type: application/json' -d '{
"repositories": [
"<repository-name>"
]
}' '<base_url>/rest/recovery/restore/<backup-name>?cleanDataDir=false&withSystemData=false'
Which does the following:
Existing repositories are kept because of
cleanDataDir=false
.The provided repositories are restored from the backup.
User accounts and settings are kept because
withSystemData
is set tofalse
.
Restore options¶
Applying a backup can be further configured via the following options:
Option |
Description |
---|---|
|
If set to |
|
Determines whether GraphDB should restore user account data such as user accounts, saved queries, visual graphs etc. from a backup or continue with the their current state. Specified as a request parameter in the URL. Boolean, the default value is |
|
List of repositories to recover from the backup. Specified as JSON in the body.
If not provided, all repositories that are in the backup will be restored.
|
Restoring a Cluster¶
The purpose of the restore procedure is to recover a failed node or recover to a given state of repositories. This is why in a cluster, the restore should be done on a node that is detached from the cluster.
To recover a cluster to a given desired state, proceed as follows:
Delete the cluster.
(optional) If the backup is stored on only one node, copy the backup
.tar
file located in the node’s/data/recovery directory
in the respective/data/recovery directory
of each cluster node.Restore each node from the backup.
Recreate the cluster group.
Warning
We do not recommend restoring nodes that are part of a cluster group. Attempting this might result in a Byzantine behavior as the node might recover to a state with a different number of repositories or a different transaction log, which would lead to cluster protocol failure.
Managing Backups¶
We can view all existing backups through a simple GET request to the /rest/recovery/backups
endpoint.
To list all available backups for the given GraphDB instance using cURL, execute:
curl '<base_url>/rest/recovery/backups'
Note
As backups require storage, it is a good practice to remove outdated backups. This can be done through the /rest/recovery/backups/{name}
endpoint.
To remove a backup using a cURL request, execute:
curl -X DELETE '<base_url>/rest/recovery/backups/<backup-name>'