Backup and Restore¶
What’s in this document?
GraphDB supports the backup and restore of both a single GraphDB instance and a cluster through its recovery REST API. Both partial (per-repository) and full recovery procedures are available with optional inclusion of user account data.
Note
As with all operations that involve a REST API, in order to perform a backup or a restore procedure:
The respective GraphDB instance must be online.
The cluster must be writable, i.e., the majority of its nodes must be active.
Starting with version 10.4.0, GraphDB uses LZ4 compression for backup and restore. Parallel backup compression and parallel S3 streaming are only available with licenses with four or more licensed cores.
Warning
Compressed backups created with GraphDB 10.4.0 and newer are not backward compatible and cannot be restored with older versions of GraphDB. However, restore procedures are backward compatible - in other words, in GraphDB 10.4.0 and newer you are able to restore backups created with older versions of GraphDB.
Planning a Backup¶
Whether you want to be able to quickly recover your data in case of failure or perform routine admin operations such as upgrading a GraphDB instance, it is important to prepare an optimal backup & restore procedure.
There are various factors to take into consideration when designing a backup strategy, such as:
Optimal timing for downtime tolerance for applying backup
Read-only tolerance on a single node setup for creating a backup
Load-balanced backup creation (backup is created by one of the followers, so if a quorum exists, updates will be processed)
Scope of the backed-up data (e.g., full or per-repository backup, or whether user accounts and settings are included)
Available system resources and specifically ensuring enough disk space for backup
Frequency of backup creation
Creating a Backup¶
As mentioned, backups can be either covering all repositories (full data backup) or only selected existing repositories (partial data backup), and they may also include the user accounts and settings.
Cluster backup creation is lock-free, meaning that by leveraging the multiple instances and quorum mechanism, the cluster can create a backup while simultaneously processing updates if the deployment has more than 2 nodes.
A GraphDB instance can be backed up using the /rest/recovery/backup
endpoint. To create a backup, simply POST an HTTP request as shown below.
Note
Creating a backup requires the administrator role.
Backup options¶
The following parameters can be configured when creating a backup:
Option |
Description |
---|---|
|
List of repositories to be backed up. Specified as JSON in the request body.
|
|
Determines whether user account data such as user accounts, saved queries, visual graphs etc. should be included in the backup. Specified as JSON in the request body. Boolean, the default value is |
Full data backup¶
Here is an example cURL request for full data backup creation without system data (i.e., user accounts and settings):
curl -X POST -OJ -H 'Content-Type: application/json' '<base_url>/rest/recovery/backup'
This does the following:
Backs up all data in all repositories.
Does not include user accounts and settings because
backupSystemData = false
by default (see the above Backup options).Creates the backup as a new file of the type
backup-yyyy-mm-dd-hh-mm-ss.tar
.
Note
This is an archive file that you do not need to extract – it is to be used as is.
To set the name of the backup yourself, replace -OJ
with --output <backup-name>
, i.e.:
curl -X POST --output <backup-name> -H 'Content-Type: application/json' '<base_url>/rest/recovery/backup'
Partial data backup¶
Here is an example cURL request for partial data backup creation without system data:
curl -X POST -OJ -H 'Content-Type: application/json' -d '{
"repositories":["<repo_name>"]
}' '<base_url>/rest/recovery/backup'
Which does the following:
Backs up one or more repositories that are explicitly named.
Does not include user accounts and settings as
backupSystemData = false
by default.
You can also use --output <backup-name>
instead of -OJ
if you want to customize the name of the backup as shown above.
Note
If a POST request does not include a list of repositories for backup, it will automatically create a full data backup.
Full data and system backup¶
Here is an example cURL request for full data and system backup creation with system data:
curl -X POST -OJ -H 'Content-Type: application/json' -d '{
"backupSystemData": true
}' '<base_url>/rest/recovery/backup'
Which does the following:
Backs up all data in all repositories.
Backup includes user accounts and settings as
backupSystemData = true
is explicitly provided.
System data only backup¶
Here is an example cURL request for creating a backup of system data only:
curl -X POST -OJ -H 'Content-Type: application/json' -d '{
"repositories" : [], "backupSystemData": true
}' '<base_url>/rest/recovery/backup'
Which does the following:
Backup includes user accounts and settings as
backupSystemData = true
is explicitly provided.No repositories are included in the backup as
repositories
is an empty list (repositories: []
).Note
If this parameter is not provided, all repositories will be included in the backup.
Restoring from a Backup¶
A GraphDB instance or cluster can be restored to a backed-up state through the /rest/recovery/restore
endpoint.
The recovery procedure in the cluster is treated as a simple update as it leverages the Raft protocol that allows a set of distributed nodes to act as one.
Tip
It is recommended to perform cluster transaction log truncate operations after a successful data restore, as the transaction log will use more storage space upon a backup/restore procedure.
To restore a backup, simply POST an HTTP request as shown below.
Note
Restoring a backup requires the administrator role.
Restore options¶
The following parameters can be configured when restoring from a backup:
Option |
Description |
---|---|
|
List of repositories to recover from the backup. Specified as JSON in the request body.
|
|
Determines whether GraphDB should restore user account data such as user accounts, saved queries, visual graphs etc. from a backup or continue with the their current state. Specified as JSON in the request body. If no system data is found in the backup, an error will be returned. Boolean, the default is |
|
Cleans other existing repositories on the GraphDB instance where the restore is done. The default is |
Full data restore preserving other repositories¶
If we have successfully created a backup and want to completely revert to the backed-up state while preserving the existing repositories on the instance where we are restoring, we can use the below cURL request example. No additional parameters are provided, meaning that defaults are applied.
curl -X POST '<base_url>/rest/recovery/restore'
-H 'Content-Type: multipart/form-data'
-F 'params={
};type=application/json'
-F file=@./<full-data-backup-name.tar>
Note
The full-data-backup-name.tar
file must be a full data backup created as shown here.
Full data restore with replace¶
We can also apply a backup and remove repositories that are not restored from it.
curl -X POST '<base_url>/rest/recovery/restore'
-H 'Content-Type: multipart/form-data'
-F 'params={
"removeStaleRepositories": true
};type=application/json'
-F file=@./<full-data-backup-name.tar>
What this does:
Removes other repositories on the instance where the backup is applied as
removeStaleRepositories = true
.Does a full data restore as the
repositories
parameter is not provided.
Partial data restore¶
Here, we need to provide the names of the repositories that we want to restore as values for the repositories
parameter.
curl -X POST '<base_url>/rest/recovery/restore'
-H 'Content-Type: multipart/form-data'
-F 'params={
"repositories" : ["<repo-name>"]
};type=application/json};type=application/json'
-F file=@./<full-data-backup-name.tar>
System data only restore¶
To restore only the system data from a backup, we can use the following cURL request:
curl -X POST '<base_url>/rest/recovery/restore'
-H 'Content-Type: multipart/form-data'
-F 'params={
"repositories" : [],
"restoreSystemData": true
};type=application/json}'
-F file=@./<full-data-system-backup-name.tar>
What this does:
User account data is restored as
restoreSystemData = true
.No repositories are restored as the
repositories
parameter is an empty list ([]
).Note
The
full-data-system-backup-name.tar
file must contain system data, i.e., the backup must be created withbackupSystemData = true
as shown here.
Creating and restoring cloud backups¶
Note
Currently, only Amazon S3 cloud storage is supported.
You can also create a backup saved in the cloud, and restore from backup stored on cloud storage. Cloud backup and restore have the same options as regular GraphDB backup and regular GraphDB restore, with an additional bucketUri
parameter that contains all the information about the cloud bucket. For Amazon’s S3, it uses the following format:
s3://[<endpoint-hostname>:<endpoint-port>]/<bucket-name>/<backup-name>?region=<AWSRegion>&AWS_ACCESS_KEY_ID=<key-id>&AWS_SECRET_ACCESS_KEY=<access-key>
The endpoint-hostname
and endpoint-port
values are only used for S3 compatible services. To use Amazon S3, these values should be left blank and the URL should start with three /
before the bucket, as described below:
s3:///my-bucket/my-graphdb-backup/?region=eu-west-1&AWS_ACCESS_KEY_ID=secretKey&AWS_SECRET_ACCESS_KEY=secret
If either AWS_ACCESS_KEY_ID
or AWS_SECRET_ACCESS_KEY
isn’t provided, it will fall back to the AWS standardized credential providers.
Starting from GraphDB 10.4.1, if region
is not provided, it will attempt to use the default AWS region provider chain to resolve the configured region. If that fails, it will default to us-east-1
– US East (N. Virginia). Furthermore, if the provided region does not match the one where the bucket is located, once a connection to the bucket has been established, all subsequent requests will automatically be directed to the region the bucket resides in.
In addition to the default backup options, you can also set up these global parameters in your graphdb.properties
file before starting your GraphDB instance:
Property name |
Description |
Default value |
---|---|---|
|
Enables TLS for secure connection to S3 compatible services |
|
|
Timeout in seconds for a cloud backup’s single part upload |
|
Creating a cloud backup¶
The GraphDB instance uses a different endpoint when creating a backup saved in the cloud – /rest/recovery/cloud-backup
.
Here is an example cURL request for full data backup creation with system data:
curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{
"backupOptions": { "backupSystemData": true },
"bucketUri": "s3:///<bucket_name>/<backup_name>?region=<region>&AWS_ACCESS_KEY_ID=<key_id>&AWS_SECRET_ACCESS_KEY=<key>"
}' '<base_url>/rest/recovery/cloud-backup'
The backupOptions
parameter is optional. If nothing is passed for it, the default values of the options will be used.
The backup examples from above are also valid for the cloud backup. As long as the cloud backup is provided with the same backupOptions
and the bucketUri
is valid, the resulting backup .tar
file should be the same.
Restoring from a cloud backup¶
The GraphDB instance uses a different endpoint when restoring from a backup saved on cloud storage – /rest/recovery/cloud-restore
.
Here is an example cURL request for applying a backup and removing all repositories that are not restored from it:
curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{
"restoreOptions": { "removeStaleRepositories": true },
"bucketUri": "s3:///<bucket_name>/<backup_name>?region=<region>&AWS_ACCESS_KEY_ID=<key_id>&AWS_SECRET_ACCESS_KEY=<key>"
}' '<base_url>/rest/recovery/cloud-restore'
The restoreOptions
parameter is optional. If nothing is passed for it, the default values of the options will be used.
The restore examples from above are also valid for the cloud restore. As long as the cloud restore endpoint is provided with the same restoreOptions
and the bucketUri
is a valid GraphDB backup file, the resulting restore should be the same.
Monitoring your recovery operations¶
You can monitor your backups through
. The backup monitoring interface displays the backups and restores that are currently underway, and displays additional information, such as the recovery operation type (backup or restore), the user who initiated the procedure, the affected repositories, how much time has elapsed since the procedure was initiated, and the snapshot options.This interface also allows you to temporarily Pause updates of the table so that you can copy text from it.

Warning
The Pause button doesn’t pause the recovery operation itself – it merely freezes the information presented in Workbench and prevents the table from updating.
You can also access the Global Monitoring notification area on the top of every Workbench view.
interface through the