Backup and Restore

GraphDB supports the backup and restore of both a single GraphDB instance and a cluster through its recovery REST API. Both partial (per-repository) and full recovery procedures are available with optional inclusion of user account data.

Note

As with all operations that involve a REST API, in order to perform a backup or a restore procedure:

  • The respective GraphDB instance must be online.

  • The cluster must be writable, i.e., the majority of its nodes must be active.

Starting with version 10.4.0, GraphDB uses LZ4 compression for backup and restore. Parallel backup compression and parallel S3 streaming are only available with licenses with four or more licensed cores.

Warning

Compressed backups created with GraphDB 10.4.0 and newer are not backward compatible and cannot be restored with older versions of GraphDB. However, restore procedures are backward compatible - in other words, in GraphDB 10.4.0 and newer you are able to restore backups created with older versions of GraphDB.

Planning a Backup

Whether you want to be able to quickly recover your data in case of failure or perform routine admin operations such as upgrading a GraphDB instance, it is important to prepare an optimal backup & restore procedure.

There are various factors to take into consideration when designing a backup strategy, such as:

  • Optimal timing for downtime tolerance for applying backup

  • Read-only tolerance on a single node setup for creating a backup

  • Load-balanced backup creation (backup is created by one of the followers, so if a quorum exists, updates will be processed)

  • Scope of the backed-up data (e.g., full or per-repository backup, or whether user accounts and settings are included)

  • Available system resources and specifically ensuring enough disk space for backup

  • Frequency of backup creation

Creating a Backup

As mentioned, backups can be either covering all repositories (full data backup) or only selected existing repositories (partial data backup), and they may also include the user accounts and settings.

Cluster backup creation is lock-free, meaning that by leveraging the multiple instances and quorum mechanism, the cluster can create a backup while simultaneously processing updates if the deployment has more than 2 nodes.

A GraphDB instance can be backed up using the /rest/recovery/backup endpoint. To create a backup, simply POST an HTTP request as shown below.

Note

Creating a backup requires the administrator role.

Backup options

The following parameters can be configured when creating a backup:

Option

Description

repositories

List of repositories to be backed up. Specified as JSON in the request body.

  • If the parameter is missing, all repositories will be included in the backup.

  • If it is an empty list ([]), no repositories will be included in the backup.

  • Otherwise, the repositories from the list will be included in the backup.

backupSystemData

Determines whether user account data such as user accounts, saved queries, visual graphs etc. should be included in the backup. Specified as JSON in the request body. Boolean, the default value is false.

Full data backup

Here is an example cURL request for full data backup creation without system data (i.e., user accounts and settings):

curl -X POST -OJ -H 'Content-Type: application/json' '<base_url>/rest/recovery/backup'

This does the following:

  • Backs up all data in all repositories.

  • Does not include user accounts and settings because backupSystemData = false by default (see the above Backup options).

  • Creates the backup as a new file of the type backup-yyyy-mm-dd-hh-mm-ss.tar.

Note

This is an archive file that you do not need to extract – it is to be used as is.

To set the name of the backup yourself, replace -OJ with --output <backup-name>, i.e.:

curl -X POST --output <backup-name> -H 'Content-Type: application/json' '<base_url>/rest/recovery/backup'

Partial data backup

Here is an example cURL request for partial data backup creation without system data:

curl -X POST -OJ -H 'Content-Type: application/json' -d '{
   "repositories":["<repo_name>"]
}' '<base_url>/rest/recovery/backup'

Which does the following:

  • Backs up one or more repositories that are explicitly named.

  • Does not include user accounts and settings as backupSystemData = false by default.

You can also use --output <backup-name> instead of -OJ if you want to customize the name of the backup as shown above.

Note

If a POST request does not include a list of repositories for backup, it will automatically create a full data backup.

Full data and system backup

Here is an example cURL request for full data and system backup creation with system data:

curl -X POST -OJ -H 'Content-Type: application/json' -d '{
   "backupSystemData": true
}' '<base_url>/rest/recovery/backup'

Which does the following:

  • Backs up all data in all repositories.

  • Backup includes user accounts and settings as backupSystemData = true is explicitly provided.

System data only backup

Here is an example cURL request for creating a backup of system data only:

curl -X POST -OJ -H 'Content-Type: application/json' -d '{
   "repositories" : [], "backupSystemData": true
}' '<base_url>/rest/recovery/backup'

Which does the following:

  • Backup includes user accounts and settings as backupSystemData = true is explicitly provided.

  • No repositories are included in the backup as repositories is an empty list (repositories: []).

    Note

    If this parameter is not provided, all repositories will be included in the backup.

Restoring from a Backup

A GraphDB instance or cluster can be restored to a backed-up state through the /rest/recovery/restore endpoint.

The recovery procedure in the cluster is treated as a simple update as it leverages the Raft protocol that allows a set of distributed nodes to act as one.

Tip

It is recommended to perform cluster transaction log truncate operations after a successful data restore, as the transaction log will use more storage space upon a backup/restore procedure.

To restore a backup, simply POST an HTTP request as shown below.

Note

Restoring a backup requires the administrator role.

Restore options

The following parameters can be configured when restoring from a backup:

Option

Description

repositories

List of repositories to recover from the backup. Specified as JSON in the request body.

  • If the parameter is missing, all repositories that are in the backup will be restored.

  • If it is an empty list ([]), no repositories from the backup will be restored.

  • Otherwise, the repositories from the list will be restored.

restoreSystemData

Determines whether GraphDB should restore user account data such as user accounts, saved queries, visual graphs etc. from a backup or continue with the their current state. Specified as JSON in the request body. If no system data is found in the backup, an error will be returned. Boolean, the default is false.

removeStaleRepositories

Cleans other existing repositories on the GraphDB instance where the restore is done. The default is false, meaning that no repositories will be cleaned.

Full data restore preserving other repositories

If we have successfully created a backup and want to completely revert to the backed-up state while preserving the existing repositories on the instance where we are restoring, we can use the below cURL request example. No additional parameters are provided, meaning that defaults are applied.

curl -X POST '<base_url>/rest/recovery/restore'
-H 'Content-Type: multipart/form-data'
-F 'params={
        };type=application/json'
-F file=@./<full-data-backup-name.tar>

Note

The full-data-backup-name.tar file must be a full data backup created as shown here.

Full data restore with replace

We can also apply a backup and remove repositories that are not restored from it.

curl -X POST '<base_url>/rest/recovery/restore'
-H 'Content-Type: multipart/form-data'
-F 'params={
        "removeStaleRepositories": true
    };type=application/json'
-F file=@./<full-data-backup-name.tar>

What this does:

  • Removes other repositories on the instance where the backup is applied as removeStaleRepositories = true.

  • Does a full data restore as the repositories parameter is not provided.

Partial data restore

Here, we need to provide the names of the repositories that we want to restore as values for the repositories parameter.

curl -X POST '<base_url>/rest/recovery/restore'
-H 'Content-Type: multipart/form-data'
-F 'params={
        "repositories" : ["<repo-name>"]
    };type=application/json};type=application/json'
-F file=@./<full-data-backup-name.tar>

System data only restore

To restore only the system data from a backup, we can use the following cURL request:

curl -X POST '<base_url>/rest/recovery/restore'
-H 'Content-Type: multipart/form-data'
-F 'params={
        "repositories" : [],
        "restoreSystemData": true
    };type=application/json}'
-F file=@./<full-data-system-backup-name.tar>

What this does:

  • User account data is restored as restoreSystemData = true.

  • No repositories are restored as the repositories parameter is an empty list ([]).

    Note

    The full-data-system-backup-name.tar file must contain system data, i.e., the backup must be created with backupSystemData = true as shown here.

Creating and restoring cloud backups

Note

Currently, only Amazon S3 cloud storage is supported.

You can also create a backup saved in the cloud, and restore from backup stored on cloud storage. Cloud backup and restore have the same options as regular GraphDB backup and regular GraphDB restore, with an additional bucketUri parameter that contains all the information about the cloud bucket. For Amazon’s S3, it uses the following format:

s3://[<endpoint-hostname>:<endpoint-port>]/<bucket-name>/<backup-name>?region=<AWSRegion>&AWS_ACCESS_KEY_ID=<key-id>&AWS_SECRET_ACCESS_KEY=<access-key>

The endpoint-hostname and endpoint-port values are only used for S3 compatible services. To use Amazon S3, these values should be left blank and the URL should start with three / before the bucket, as described below:

s3:///my-bucket/my-graphdb-backup/?region=eu-west-1&AWS_ACCESS_KEY_ID=secretKey&AWS_SECRET_ACCESS_KEY=secret

If either AWS_ACCESS_KEY_ID or AWS_SECRET_ACCESS_KEY isn’t provided, it will fall back to the AWS standardized credential providers.

Starting from GraphDB 10.4.1, if region is not provided, it will attempt to use the default AWS region provider chain to resolve the configured region. If that fails, it will default to us-east-1 – US East (N. Virginia). Furthermore, if the provided region does not match the one where the bucket is located, once a connection to the bucket has been established, all subsequent requests will automatically be directed to the region the bucket resides in.

In addition to the default backup options, you can also set up these global parameters in your graphdb.properties file before starting your GraphDB instance:

Property name

Description

Default value

graphdb.s3.tls.enabled

Enables TLS for secure connection to S3 compatible services

false

graphdb.s3.backup.httpclient.write.timeout

Timeout in seconds for a cloud backup’s single part upload

3600

Creating a cloud backup

The GraphDB instance uses a different endpoint when creating a backup saved in the cloud – /rest/recovery/cloud-backup.

Here is an example cURL request for full data backup creation with system data:

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{
  "backupOptions": { "backupSystemData": true },
  "bucketUri": "s3:///<bucket_name>/<backup_name>?region=<region>&AWS_ACCESS_KEY_ID=<key_id>&AWS_SECRET_ACCESS_KEY=<key>"
}' '<base_url>/rest/recovery/cloud-backup'

The backupOptions parameter is optional. If nothing is passed for it, the default values of the options will be used.

The backup examples from above are also valid for the cloud backup. As long as the cloud backup is provided with the same backupOptions and the bucketUri is valid, the resulting backup .tar file should be the same.

Restoring from a cloud backup

The GraphDB instance uses a different endpoint when restoring from a backup saved on cloud storage – /rest/recovery/cloud-restore.

Here is an example cURL request for applying a backup and removing all repositories that are not restored from it:

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{
  "restoreOptions": { "removeStaleRepositories": true },
  "bucketUri": "s3:///<bucket_name>/<backup_name>?region=<region>&AWS_ACCESS_KEY_ID=<key_id>&AWS_SECRET_ACCESS_KEY=<key>"
}' '<base_url>/rest/recovery/cloud-restore'

The restoreOptions parameter is optional. If nothing is passed for it, the default values of the options will be used.

The restore examples from above are also valid for the cloud restore. As long as the cloud restore endpoint is provided with the same restoreOptions and the bucketUri is a valid GraphDB backup file, the resulting restore should be the same.

Monitoring your recovery operations

You can monitor your backups through Monitor ‣ Backup and Restore. The backup monitoring interface displays the backups and restores that are currently underway, and displays additional information, such as the recovery operation type (backup or restore), the user who initiated the procedure, the affected repositories, how much time has elapsed since the procedure was initiated, and the snapshot options.

This interface also allows you to temporarily Pause updates of the table so that you can copy text from it.

_images/backup-monitoring.png

Warning

The Pause button doesn’t pause the recovery operation itself – it merely freezes the information presented in Workbench and prevents the table from updating.

You can also access the Backup and Restore interface through the Global Monitoring notification area on the top of every Workbench view.

_images/global-monitoring-notification.png