Backup and Restore¶
What’s in this document?
GraphDB supports the backup and restore of both a single GraphDB instance and a cluster through its recovery REST API. Both partial (per-repository) and full recovery procedures are available with optional inclusion of user account data.
Important
As with all operations that involve a REST API, in order to perform a backup or a restore procedure:
The respective GraphDB instance must be online.
The cluster must be writable, i.e., the majority of its nodes must be active.
Planning a Backup¶
Whether you want to be able to quickly recover your data in case of failure or perform routine admin operations such as upgrading a GraphDB instance, it is important to prepare an optimal backup & restore procedure.
There are various factors to take into consideration when designing a backup strategy, such as:
Optimal timing for downtime tolerance for applying backup
Read-only tolerance on a single node setup for creating a backup
Load-balanced backup creation (backup is created by one of the followers, so if a quorum exists, updates will be processed)
Scope of the backed-up data (e.g., full or per-repository backup, or whether user accounts and settings are included)
Available system resources and specifically ensuring enough disk space for backup
Frequency of backup creation
Creating a Backup¶
As mentioned, backups can be either covering all repositories (full data backup) or only selected existing repositories (partial data backup), and they may also include the user accounts and settings.
Cluster backup creation is lock-free, meaning that by leveraging the multiple instances and quorum mechanism, the cluster can create a backup while simultaneously processing updates if the deployment has more than 2 nodes.
A GraphDB instance can be backed up using the /rest/recovery/backup
endpoint. To create a backup, simply POST an HTTP request as shown below.
Note
Creating a backup requires the administrator role.
Backup options¶
The following parameters can be configured when creating a backup:
Option |
Description |
---|---|
|
List of repositories to be backed up. Specified as JSON in the request body.
|
|
Determines whether user account data such as user accounts, saved queries, visual graphs etc. should be included in the backup. Specified as JSON in the request body. Boolean, the default value is |
Full data backup¶
Here is an example cURL request for full data backup creation without system data (i.e., user accounts and settings):
curl -X POST -OJ -H 'Content-Type: application/json' '<base_url>/rest/recovery/backup'
This does the following:
Backs up all data in all repositories.
Does not include user accounts and settings because
backupSystemData = false
by default (see the above Backup options).Creates the backup as a new file of the type
backup-yyyy-mm-dd-hh-mm-ss.tar
.
Note
This is an archive file that you do not need to extract – it is to be used as is.
To set the name of the backup yourself, replace -OJ
with --output <backup-name>
, i.e.:
curl -X POST --output <backup-name> -H 'Content-Type: application/json' '<base_url>/rest/recovery/backup'
Partial data backup¶
Here is an example cURL request for partial data backup creation without system data:
curl -X POST -OJ -H 'Content-Type: application/json' -d '{
"repositories":["<repo_name>"]
}' '<base_url>/rest/recovery/backup'
Which does the following:
Backs up one or more repositories that are explicitly named.
Does not include user accounts and settings as
backupSystemData = false
by default.
You can also use --output <backup-name>
instead of -OJ
if you want to customize the name of the backup as shown above.
Note
If a POST request does not include a list of repositories for backup, it will automatically create a full data backup.
Full data and system backup¶
Here is an example cURL request for full data and system backup creation with system data:
curl -X POST -OJ -H 'Content-Type: application/json' -d '{
"backupSystemData": true
}' '<base_url>/rest/recovery/backup'
Which does the following:
Backs up all data in all repositories.
Backup includes user accounts and settings as
backupSystemData = true
is explicitly provided.
System data only backup¶
Here is an example cURL request for creating a backup of system data only:
curl -X POST -OJ -H 'Content-Type: application/json' -d '{
"repositories" : [], "backupSystemData": true
}' '<base_url>/rest/recovery/backup'
Which does the following:
Backup includes user accounts and settings as
backupSystemData = true
is explicitly provided.No repositories are included in the backup as
repositories
is an empty list (repositories: []
).Note
If this parameter is not provided, all repositories will be included in the backup.
Cloud backup¶
Important
Currently, only Amazon S3 cloud storage is supported.
To create a backup saved in the cloud, the GraphDB instance uses a different endpoint – /rest/recovery/cloud-backup
.
Cloud backup has the same options as regular GraphDB backup, with an additional bucketUri
parameter that contains all the information about the cloud bucket. For Amazon’s S3, it uses the following format:
s3://[<endpoint-hostname>:<endpoint-port>]/<bucket-name>/<backup-name>?region=<AWSRegion>&AWS_ACCESS_KEY_ID=<key-id>&AWS_SECRET_ACCESS_KEY=<access-key>
The endpoint-hostname
and endpoint-port
values are only used for local S3 clones. To use Amazon S3, these values should be left blank and the URL should start with three /
before the bucket, as below:
s3:///my-bucket/graphdb-backups/<backup-name>?region=eu-west-1&AWS_ACCESS_KEY_ID=secretKey&AWS_SECRET_ACCESS_KEY=secret
Here is an example cURL request for full data backup creation with system data:
curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{
"backupOptions": { "backupSystemData": true },
"bucketUri": "s3:///<bucket_name>/<backup_name>?region=<region>&AWS_ACCESS_KEY_ID=<key_id>&AWS_SECRET_ACCESS_KEY=<key>"
}' '<base_url>/rest/recovery/cloud-backup'
The backupOptions
parameter is optional. If nothing is passed for it, the default values of the options will be used.
The backup examples from above are also valid for the cloud backup. As long as the cloud backup is provided with the same backupOptions
and the bucketUri
is valid, the resulting backup .tar
file should be the same.
Restoring from a Backup¶
A GraphDB instance or cluster can be restored to a backed-up state through the /rest/recovery/restore
endpoint.
The recovery procedure in the cluster is treated as a simple update as it leverages the Raft protocol that allows a set of distributed nodes to act as one.
Important
It is recommended to perform cluster transaction log truncate operations after a successful data restore, as the transaction log will use more storage space upon a backup/restore procedure.
To restore a backup, simply POST an HTTP request as shown below.
Note
Restoring a backup requires the administrator role.
Restore options¶
The following parameters can be configured when restoring from a backup:
Option |
Description |
---|---|
|
List of repositories to recover from the backup. Specified as JSON in the request body.
|
|
Determines whether GraphDB should restore user account data such as user accounts, saved queries, visual graphs etc. from a backup or continue with the their current state. Specified as JSON in the request body. If no system data is found in the backup, an error will be returned. Boolean, the default is |
|
Cleans other existing repositories on the GraphDB instance where the restore is done. The default is |
Full data restore preserving other repositories¶
If we have successfully created a backup and want to completely revert to the backed-up state while preserving the existing repositories on the instance where we are restoring, we can use the below cURL request example. No additional parameters are provided, meaning that defaults are applied.
curl -X POST '<base_url>/rest/recovery/restore'
-H 'Content-Type: multipart/form-data'
-F 'params={
};type=application/json'
-F file=@./<full-data-backup-name.tar>
Note
The full-data-backup-name.tar
file must be a full data backup created as shown here.
Full data restore with replace¶
We can also apply a backup and remove repositories that are not restored from it.
curl -X POST '<base_url>/rest/recovery/restore'
-H 'Content-Type: multipart/form-data'
-F 'params={
"removeStaleRepositories": true
};type=application/json'
-F file=@./<full-data-backup-name.tar>
What this does:
Removes other repositories on the instance where the backup is applied as
removeStaleRepositories = true
.Does a full data restore as the
repositories
parameter is not provided.
Partial data restore¶
Here, we need to provide the names of the repositories that we want to restore as values for the repositories
parameter.
curl -X POST '<base_url>/rest/recovery/restore'
-H 'Content-Type: multipart/form-data'
-F 'params={
"repositories" : ["<repo-name>"]
};type=application/json};type=application/json'
-F file=@./<full-data-backup-name.tar>
System data only restore¶
To restore only the system data from a backup, we can use the following cURL request:
curl -X POST '<base_url>/rest/recovery/restore'
-H 'Content-Type: multipart/form-data'
-F 'params={
"repositories" : [],
"restoreSystemData": true
};type=application/json}'
-F file=@./<full-data-system-backup-name.tar>
What this does:
User account data is restored as
restoreSystemData = true
.No repositories are restored as the
repositories
parameter is an empty list ([]
).Note
The
full-data-system-backup-name.tar
file must contain system data, i.e., the backup must be created withbackupSystemData = true
as shown here.
Cloud restore¶
Important
Currently, only Amazon S3 cloud storage is supported.
To restore from a backed up state saved on cloud storage, the GraphDB instance uses a different endpoint – /rest/recovery/cloud-restore
.
Cloud restore has the same options as regular GraphDB restore, with an additional bucketUri
parameter that contains all the information about the cloud bucket. For Amazon’s S3, it uses the following format:
s3://[<endpoint-hostname>:<endpoint-port>]/<bucket-name>/<backup-name>?region=<AWSRegion>&AWS_ACCESS_KEY_ID=<key-id>&AWS_SECRET_ACCESS_KEY=<access-key>
The endpoint-hostname
and endpoint-port
values are only used for local S3 clones. To use Amazon S3, these values should be left blank and the URL should start with three /
before the bucket, as below:
s3:///my-bucket/graphdb-backups/<backup-name>?region=eu-west-1&AWS_ACCESS_KEY_ID=secretKey&AWS_SECRET_ACCESS_KEY=secret
Here is an example cURL request for applying a backup and removing all repositories that are not restored from it:
curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{
"restoreOptions": { "removeStaleRepositories": true },
"bucketUri": "s3:///<bucket_name>/<backup_name>?region=<region>&AWS_ACCESS_KEY_ID=<key_id>&AWS_SECRET_ACCESS_KEY=<key>"
}' '<base_url>/rest/recovery/cloud-restore'
The restoreOptions
parameter is optional. If nothing is passed for it, the default values of the options will be used.
The restore examples from above are also valid for the cloud restore. As long as the cloud restore endpoint is provided with the same restoreOptions
and the bucketUri
is a valid GraphDB backup file, the resulting restore should be the same.