Backup and Restore

GraphDB supports the backup and restore of both a single GraphDB instance and a cluster through its recovery REST API. Both partial (per-repository) and full recovery procedures are available with optional inclusion of user account data.

Important

As with all operations that involve a REST API, in order to perform a backup or a restore procedure:

  • The respective GraphDB instance must be online.

  • The cluster must be writable, i.e., the majority of its nodes must be active.

Planning a Backup

Whether you want to be able to quickly recover your data in case of failure or perform routine admin operations such as upgrading a GraphDB instance, it is important to prepare an optimal backup & restore procedure.

There are various factors to take into consideration when designing a backup strategy, such as:

  • Optimal timing for downtime tolerance for applying backup

  • Read-only tolerance on a single node setup for creating a backup

  • Load-balanced backup creation (backup is created by one of the followers, so if a quorum exists, updates will be processed)

  • Scope of the backed-up data (e.g., full or per-repository backup, or whether user accounts and settings are included)

  • Available system resources and specifically ensuring enough disk space for backup

  • Frequency of backup creation

Creating a Backup

As mentioned, backups can be either covering all repositories (full data backup) or only selected existing repositories (partial data backup), and they may also include the user accounts and settings.

Cluster backup creation is lock-free, meaning that by leveraging the multiple instances and quorum mechanism, the cluster can create a backup while simultaneously processing updates if the deployment has more than 2 nodes.

A GraphDB instance can be backed up using the /rest/recovery/backup endpoint. To create a backup, simply POST an HTTP request as shown below.

Note

Creating a backup requires the administrator role.

Backup options

The following parameters can be configured when creating a backup:

Option

Description

repositories

List of repositories to be backed up. Specified as JSON in the request body.

  • If the parameter is missing, all repositories will be included in the backup.

  • If it is an empty list ([]), no repositories will be included in the backup.

  • Otherwise, the repositories from the list will be included in the backup.

backupSystemData

Determines whether user account data such as user accounts, saved queries, visual graphs etc. should be included in the backup. Specified as JSON in the request body. Boolean, the default value is false.

Full data backup

Here is an example cURL request for full data backup creation without system data (i.e., user accounts and settings):

curl -X POST -OJ -H 'Content-Type: application/json' '<base_url>/rest/recovery/backup'

This does the following:

  • Backs up all data in all repositories.

  • Does not include user accounts and settings because backupSystemData = false by default (see the above Backup options).

  • Creates the backup as a new file of the type backup-yyyy-mm-dd-hh-mm-ss.tar.

Note

This is an archive file that you do not need to extract – it is to be used as is.

To set the name of the backup yourself, replace -OJ with --output <backup-name>, i.e.:

curl -X POST --output <backup-name> -H 'Content-Type: application/json' '<base_url>/rest/recovery/backup'

Partial data backup

Here is an example cURL request for partial data backup creation without system data:

curl -X POST -OJ -H 'Content-Type: application/json' -d '{
   "repositories":["<repo_name>"]
}' '<base_url>/rest/recovery/backup'

Which does the following:

  • Backs up one or more repositories that are explicitly named.

  • Does not include user accounts and settings as backupSystemData = false by default.

You can also use --output <backup-name> instead of -OJ if you want to customize the name of the backup as shown above.

Note

If a POST request does not include a list of repositories for backup, it will automatically create a full data backup.

Full data and system backup

Here is an example cURL request for full data and system backup creation with system data:

curl -X POST -OJ -H 'Content-Type: application/json' -d '{
   "backupSystemData": true
}' '<base_url>/rest/recovery/backup'

Which does the following:

  • Backs up all data in all repositories.

  • Backup includes user accounts and settings as backupSystemData = true is explicitly provided.

System data only backup

Here is an example cURL request for creating a backup of system data only:

curl -X POST -OJ -H 'Content-Type: application/json' -d '{
   "repositories" : [], "backupSystemData": true
}' '<base_url>/rest/recovery/backup'

Which does the following:

  • Backup includes user accounts and settings as backupSystemData = true is explicitly provided.

  • No repositories are included in the backup as repositories is an empty list (repositories: []).

    Note

    If this parameter is not provided, all repositories will be included in the backup.

Cloud backup

Important

Currently, only Amazon S3 cloud storage is supported.

To create a backup saved in the cloud, the GraphDB instance uses a different endpoint – /rest/recovery/cloud-backup.

Cloud backup has the same options as regular GraphDB backup, with an additional bucketUri parameter that contains all the information about the cloud bucket. For Amazon’s S3, it uses the following format:

s3://[<endpoint-hostname>:<endpoint-port>]/<bucket-name>/<backup-name>?region=<AWSRegion>&AWS_ACCESS_KEY_ID=<key-id>&AWS_SECRET_ACCESS_KEY=<access-key>

The endpoint-hostname and endpoint-port values are only used for local S3 clones. To use Amazon S3, these values should be left blank and the URL should start with three / before the bucket, as below:

s3:///my-bucket/graphdb-backups/<backup-name>?region=eu-west-1&AWS_ACCESS_KEY_ID=secretKey&AWS_SECRET_ACCESS_KEY=secret

Here is an example cURL request for full data backup creation with system data:

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{
  "backupOptions": { "backupSystemData": true },
  "bucketUri": "s3:///<bucket_name>/<backup_name>?region=<region>&AWS_ACCESS_KEY_ID=<key_id>&AWS_SECRET_ACCESS_KEY=<key>"
}' '<base_url>/rest/recovery/cloud-backup'

The backupOptions parameter is optional. If nothing is passed for it, the default values of the options will be used.

The backup examples from above are also valid for the cloud backup. As long as the cloud backup is provided with the same backupOptions and the bucketUri is valid, the resulting backup .tar file should be the same.

Restoring from a Backup

A GraphDB instance or cluster can be restored to a backed-up state through the /rest/recovery/restore endpoint.

The recovery procedure in the cluster is treated as a simple update as it leverages the Raft protocol that allows a set of distributed nodes to act as one.

Important

It is recommended to perform cluster transaction log truncate operations after a successful data restore, as the transaction log will use more storage space upon a backup/restore procedure.

To restore a backup, simply POST an HTTP request as shown below.

Note

Restoring a backup requires the administrator role.

Restore options

The following parameters can be configured when restoring from a backup:

Option

Description

repositories

List of repositories to recover from the backup. Specified as JSON in the request body.

  • If the parameter is missing, all repositories that are in the backup will be restored.

  • If it is an empty list ([]), no repositories from the backup will be restored.

  • Otherwise, the repositories from the list will be restored.

restoreSystemData

Determines whether GraphDB should restore user account data such as user accounts, saved queries, visual graphs etc. from a backup or continue with the their current state. Specified as JSON in the request body. If no system data is found in the backup, an error will be returned. Boolean, the default is false.

removeStaleRepositories

Cleans other existing repositories on the GraphDB instance where the restore is done. The default is false, meaning that no repositories will be cleaned.

Full data restore preserving other repositories

If we have successfully created a backup and want to completely revert to the backed-up state while preserving the existing repositories on the instance where we are restoring, we can use the below cURL request example. No additional parameters are provided, meaning that defaults are applied.

curl -X POST '<base_url>/rest/recovery/restore'
-H 'Content-Type: multipart/form-data'
-F 'params={
        };type=application/json'
-F file=@./<full-data-backup-name.tar>

Note

The full-data-backup-name.tar file must be a full data backup created as shown here.

Full data restore with replace

We can also apply a backup and remove repositories that are not restored from it.

curl -X POST '<base_url>/rest/recovery/restore'
-H 'Content-Type: multipart/form-data'
-F 'params={
        "removeStaleRepositories": true
    };type=application/json'
-F file=@./<full-data-backup-name.tar>

What this does:

  • Removes other repositories on the instance where the backup is applied as removeStaleRepositories = true.

  • Does a full data restore as the repositories parameter is not provided.

Partial data restore

Here, we need to provide the names of the repositories that we want to restore as values for the repositories parameter.

curl -X POST '<base_url>/rest/recovery/restore'
-H 'Content-Type: multipart/form-data'
-F 'params={
        "repositories" : ["<repo-name>"]
    };type=application/json};type=application/json'
-F file=@./<full-data-backup-name.tar>

System data only restore

To restore only the system data from a backup, we can use the following cURL request:

curl -X POST '<base_url>/rest/recovery/restore'
-H 'Content-Type: multipart/form-data'
-F 'params={
        "repositories" : [],
        "restoreSystemData": true
    };type=application/json}'
-F file=@./<full-data-system-backup-name.tar>

What this does:

  • User account data is restored as restoreSystemData = true.

  • No repositories are restored as the repositories parameter is an empty list ([]).

    Note

    The full-data-system-backup-name.tar file must contain system data, i.e., the backup must be created with backupSystemData = true as shown here.

Cloud restore

Important

Currently, only Amazon S3 cloud storage is supported.

To restore from a backed up state saved on cloud storage, the GraphDB instance uses a different endpoint – /rest/recovery/cloud-restore.

Cloud restore has the same options as regular GraphDB restore, with an additional bucketUri parameter that contains all the information about the cloud bucket. For Amazon’s S3, it uses the following format:

s3://[<endpoint-hostname>:<endpoint-port>]/<bucket-name>/<backup-name>?region=<AWSRegion>&AWS_ACCESS_KEY_ID=<key-id>&AWS_SECRET_ACCESS_KEY=<access-key>

The endpoint-hostname and endpoint-port values are only used for local S3 clones. To use Amazon S3, these values should be left blank and the URL should start with three / before the bucket, as below:

s3:///my-bucket/graphdb-backups/<backup-name>?region=eu-west-1&AWS_ACCESS_KEY_ID=secretKey&AWS_SECRET_ACCESS_KEY=secret

Here is an example cURL request for applying a backup and removing all repositories that are not restored from it:

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{
  "restoreOptions": { "removeStaleRepositories": true },
  "bucketUri": "s3:///<bucket_name>/<backup_name>?region=<region>&AWS_ACCESS_KEY_ID=<key_id>&AWS_SECRET_ACCESS_KEY=<key>"
}' '<base_url>/rest/recovery/cloud-restore'

The restoreOptions parameter is optional. If nothing is passed for it, the default values of the options will be used.

The restore examples from above are also valid for the cloud restore. As long as the cloud restore endpoint is provided with the same restoreOptions and the bucketUri is a valid GraphDB backup file, the resulting restore should be the same.