Creating and Managing a Cluster¶
What’s in this document?
The below instructions will walk you through the steps for creating and monitoring a cluster group.
Prerequisites¶
You will need at least three GraphDB installations to create a fully functional cluster. Remember that the Raft algorithm recommends an odd number of nodes, so a cluster of five nodes is a good choice too.
All of the nodes must have the same security settings, and in particular the same shared token secret even when security is disabled.
For all GraphDB instances, set the following configuration property in the graphdb.properties
file and change <my-shared-secret-key>
to the desired secret:
graphdb.auth.token.secret = <my-shared-secret-key>
All of the nodes must have their networking configured correctly – the hostname reported by the OS must be resolvable to the correct IP address on each node that will participate in the cluster. In case your networking is not configured correctly or you are not sure, you can set the hostname for each node by putting graphdb.hostname = hostname.example.com
into each graphdb.properties
file, where hostname.example.com
is the hostname for that GraphDB to use in the cluster. If you do not have resolvable hostnames, you can supply an IP address instead.
The examples below assume that there are five nodes reachable at the hostnames graphdb1.example.com
, graphdb2.example.com
, graphdb3.example.com
, graphdb4.example.com
, and graphdb5.example.com
.
High availability deployment¶
A typical deployment scenario would be a deployment in cloud infrastructure with the ability to deploy GraphDB instances in different regions or zones so that if a region/zone fails, the GraphDB cluster will continue functioning without any issues for the end-user.
To achieve high availability, it is recommended to deploy GraphDB instances in different zones/regions while considering the need for a majority quorum in order to be able to accept INSERT/DELETE requests. This means that the deployment should always have over 50% of the instances running.
Another recommendation is to distribute the GraphDB instances so that you do not have exactly half of the GraphDB instances in one zone and the other half in another zone, as this way it would be easy to lose the majority quorum. In such cases, it is better to use three zones.
Cluster group with three nodes¶
In a cluster with three nodes, we need at least two in order to be able to write data successfully. In this case, the best deployment strategy is to have three GraphDB instances distributed in three zones in the same region. This way, if one zone fails, the other two instances will still form a quorum and the cluster will accept all requests.
Note
Having the instances in different regions may introduce latency.
Cluster group with five nodes¶
In a cluster with five nodes, we need three nodes for a quorum. If we have three available regions/zones, we can deploy:
two instances in zone 1,
two instances in zone 2,
one instance in zone 3.
If any of the zones fail, we would still have at least three more GraphDB instances that will form a quorum.
Create cluster¶
A cluster can be created interactively from the Workbench or programmatically via the REST API.
Using the Workbench¶
Open any of the GraphDB instances that you want to be part of the cluster, for example http://graphdb1.example.com:7200, and go to .
Click the icon to create a cluster group.
In the dialog that opens, you can see that the current GraphDB node is discovered automatically. Click Attach remote location and add the other four instances as remote instances:
This is essentially the same operation as when connecting to a remote GraphDB instance.
Clicking on Advanced settings opens an additional panel with settings that affect the entire cluster group but the defaults should be good for a start:
Once you have added all nodes (in this case
graphdb2.example.com:7200
,graphdb3.example.com:7200
,graphdb4.example.com:7200
, andgraphdb5.example.com:7200
, sincegraphdb1.example.com:7200
was discovered automatically and always has to be part of the cluster), click on each of them to include them in the cluster group:Click OK.
At first, all nodes become followers (colored in blue). Then one of the nodes initiates election, after which for a brief moment, one node becomes a candidate (you may see it briefly flash in green), and finally a leader (colored in orange).
In this example,
graphdb1.example.com
became the leader but it could have been any of the other four nodes. The fact thatgraphdb1.example.com
was used to create the cluster does not affect the leader election process in any way.All possible node and connection states are listed in the legend on the bottom left that you can toggle by clicking the question mark icon.
You can also add or remove nodes from the cluster group, as well as delete it.
See also Using a Cluster.
Using the REST API¶
You can also create a cluster using the respective REST API – see
for the interactive REST API documentation.The examples below use cURL.
To create the cluster group, simply POST the desired cluster configuration to the /rest/cluster/config
endpoint of any of the nodes (in this case http://graphdb1.example.com:7200):
Each node uses the default HTTP port of 7200 and the default RPC port of 7300.
Tip
The default RPC port is the HTTP port + 100. Thus, when the HTTP port is 7200, the RPC port will be 7300. You can set a custom RPC port using graphdb.rpc.port = NNNN
, where NNNN
is the chosen port.
curl -X POST -H 'Content-type: application/json' \
http://graphdb1.example.com:7200/rest/cluster/config \
-d '{
"nodes": [
"graphdb1.example.com:7300",
"graphdb2.example.com:7300",
"graphdb3.example.com:7300"
]
}'
Just like in the Workbench, you do not need to specify the advanced settings if you want to use the defaults. If needed, you can specify them like this:
curl -X POST -H 'Content-type: application/json' \
http://graphdb1.example.com:7200/rest/cluster/config \
-d '{
"nodes": [
"graphdb1.example.com:7300",
"graphdb2.example.com:7300",
"graphdb3.example.com:7300"
],
"electionMinTimeout": 7000,
"electionRangeTimeout": 5000,
"heartbeatInterval": 2000,
"messageSizeKB": 64,
"verificationTimeout": 1500
}'
201: Cluster successfully created
If the cluster group has been successfully created, you will get a 201
Success response code, and the returned response body will indicate that the cluster is created on all nodes:
{ "graphdb1.example.com:7300": "CREATED", "graphdb2.example.com:7300": "CREATED", "graphdb3.example.com:7300": "CREATED" }
400: Invalid cluster configuration
If the JSON configuration of the cluster group is invalid, the returned response code will be 400
Bad request.
412: Unreachable nodes or cluster already existing on a node
If at cluster config creation, some nodes are unreachable or a cluster group already exists on a given node, the response code will be 412
Precondition failed. The error will be shown in the JSON response body:
unreachable node:
{ "graphdb1.example.com:7301": "NOT_CREATED", "graphdb2.example.com:7302": "NO_CONNECTION", "graphdb3.example.com:7303": "NOT_CREATED" }
cluster already existing on a node:
{ "graphdb1.example.com:7301": "NOT_CREATED", "graphdb2.example.com:7302": "ALREADY_EXISTS", "graphdb3.example.com:7303": "NOT_CREATED" }
Creation parameters¶
The cluster group configuration has several properties that have sane default values:
Parameter |
Default value |
Description |
---|---|---|
|
8000 |
The minimum wait time in milliseconds for a heartbeat from a leader. |
|
6000 |
The variable portion of each waiting period in milliseconds for a heartbeat. |
|
2000 |
The interval in milliseconds between each heartbeat that is sent to follower nodes by the leader. |
|
1500 |
The amount of time in milliseconds a follower node would wait before attempting to verify the last committed entry when the first verification is unsuccessful. |
|
64 |
The size of the data blocks transferred during data replication streaming through the RPC protocol. |
Manage cluster membership¶
We can add and remove cluster nodes at runtime without having to stop the entire cluster group. This is achieved through total consensus between the nodes in the new configuration when making a change to the cluster membership.
When adding nodes, a total consensus means that all nodes, both the new and the old ones, have successfully appended the configuration.
If there is no majority of nodes responding to heartbeats, we can remove the non-responsive ones all at once. In this situation, a total consensus on the new configuration would be enough for this operation to be executed successfully.
It is recommended to remove fewer than 1/2 of the nodes from the current configuration.
Add nodes¶
Using the Workbench¶
New nodes can be added to the cluster group only from the leader. From OK.
, just like with node creation, attach the node’s HTTP address as a remote location and click
Using the REST API¶
From /rest/cluster/config/node
endpoint:
curl -X POST -H 'Content-Type: application/json' \
'http://graphdb1.example.com:7200/rest/cluster/config/node' \
-d '{
"nodes": [
"graphdb3.example.com:7300"
]
}'
If one of the nodes from the group or from the newly added ones has no connection to any of the nodes, an error message will be returned. This is because a total consensus between the nodes in the new group is needed to accept the configuration, which means that all of them should see each other.
If the added node is part of a different cluster, an error message will be returned.
Only the leader can make cluster membership changes, so if a follower tries to add a node to the cluster group, again an error message will be returned.
The added node should be either empty or in the same state as the cluster, which means that it should have the same repositories and namespaces as the nodes in the cluster. If one of these conditions is not met, you will not be able to add the node.
Remove nodes¶
Using the Workbench¶
Nodes can be removed from the cluster group only from the leader. From
.
Click on the nodes that you want to remove and click OK.
Using the REST API¶
From /rest/cluster/config/node
endpoint:
curl -X DELETE -H 'Content-Type: application/json' \
'http://graphdb1.example.com:7200/rest/cluster/config/node' \
-d '{
"nodes": [
"graphdb3.example.com:7300"
]
}'
If one of the nodes remaining in the new cluster configuration is down or not visible to the others, the operation will not be successful. This is because a total consensus between all nodes in the new configuration is needed, so all of them should see each other.
If a node is down, it can still be removed as it will not be part of the new configuration. If started again, the node will “think” that it is still part of the cluster and will be stuck in candidate state. The rest of the nodes will not accept any communication coming from it. In such a case, the cluster only on this node can be manually deleted from
.
Manage cluster configuration properties¶
You can view and manage the cluster configuration properties both from the Workbench and the REST API.
Using the Workbench¶
To view the properties, go to
and click the cog icon on the top right.
It will open a panel showing the cluster group config properties and a list of its nodes.

To modify the config properties, click Edit configuration.
Important
Editing of these properties is only possible on the leader node.
Using the REST API¶
To view the cluster configuration properties, go to /rest/cluster/config
endpoint on any of the nodes:
curl http://graphdb1.example.com:7200/rest/cluster/config
To check the cluster configuration, go to GET /rest/cluster/config and click Try it out.
200: Returns cluster configuration
If the cluster configuration has passed successfully, the response code will be 200
Success.
404: Cluster not found
If no cluster group has been found, the returned response code will be 404
. One such case could be when you attempt to create a cluster group with just one GraphDB node.
To update the config properties, perform a PATCH request containing the parameters of the new config to the /rest/cluster/config
endpoint:
curl -X PATCH -H 'Content-Type: application/json' \
'http://graphdb1.example.com:7200/rest/cluster/config' \
-d '{
"electionMinTimeout": 7300,
"electionRangeTimeout": 5000,
"heartbeatInterval": 2000,
"messageSizeKB": 64,
"verificationTimeout": 1500
}'
If one of the cluster nodes is down or was not able to accept the new configuration, the operation will not be successful. This is because we need a total consensus between the nodes, so if one of them cannot append the new config, all of them will reject it.
Monitor cluster status¶
To check the current status of the cluster, including the current leader, open the Workbench and go to
.The solid green lines indicate that the leader is IN_SYNC with all followers.
Clicking on a node will display some basic information about it, such its state (leader or follower) and RPC address. Clicking on its URL will open the node in a new browser tab.
You can also use the REST API to get more detailed information or to automate monitoring.
Cluster group¶
To check the status of the entire cluster group, send a GET request to the /rest/cluster/group/status
endpoint of any of the nodes, for example:
curl http://graphdb1.example.com:7200/rest/cluster/group/status
If there are no issues with the cluster group, the returned response code will be 200
with the following result:
[
{
"address" : "graphdb1.example.com:7300",
"endpoint" : "http://graphdb1.example.com:7200",
"lastLogIndex" : 0,
"lastLogTerm" : 0,
"nodeState" : "LEADER",
"syncStatus" : {
"graphdb2.example.com:7300" : "IN_SYNC",
"graphdb3.example.com:7300" : "IN_SYNC"
},
"term" : 2
},
{
"address" : "graphdb2.example.com:7300",
"endpoint" : "http://graphdb2.example.com:7200",
"lastLogIndex" : 0,
"lastLogTerm" : 0,
"nodeState" : "FOLLOWER",
"syncStatus" : {},
"term" : 2
},
{
"address" : "graphdb3.example.com:7300",
"endpoint" : "http://graphdb3.example.com:7200",
"lastLogIndex" : 0,
"lastLogTerm" : 0,
"nodeState" : "FOLLOWER",
"syncStatus" : {},
"term" : 2
}
]
Response code 404
will be returned if no cluster has been found.
Note
Any node, regardless of whether it is a leader or a follower, will return the status for all nodes in the cluster group.
Cluster node¶
To check the status of a single cluster node, send a GET request to the /rest/cluster/node/status
endpoint of the node, for example:
curl http://graphdb1.example.com:7200/rest/cluster/node/status
If there are no issues with the node, the returned response code will be 200
with the following information (for a leader):
{
"address" : "graphdb1.example.com:7300",
"endpoint" : "http://graphdb1.example.com:7200",
"lastLogIndex" : 0,
"lastLogTerm" : 0,
"nodeState" : "LEADER",
"syncStatus" : {
"graphdb2.example.com:7300" : "IN_SYNC",
"graphdb3.example.com:7300" : "IN_SYNC"
},
"term" : 2
}
For a follower node (is this case graphdb2.example.com
):
{
"address" : "graphdb2.example.com:7300",
"endpoint" : "http://graphdb2.example.com:7200",
"lastLogIndex" : 0,
"lastLogTerm" : 0,
"nodeState" : "FOLLOWER",
"syncStatus" : {},
"term" : 2
}
Delete cluster¶
To delete a cluster, open the Workbench and go to Delete cluster and confirm the operation.
. ClickWarning
This operation deletes the cluster group on all nodes, and can be executed from any node regardless of whether it is a leader or a follower. Proceed with caution.
You can also use the REST API to automate the delete operation. Send a DELETE request to the /rest/cluster/config
endpoint of any of the nodes, for example:
curl -X DELETE http://graphdb1.example.com:7200/rest/cluster/config
By default, the cluster group cannot be deleted if one or more nodes are unreachable. Reachable here means that the nodes are not in status NO_CONNECTION, therefore there is an RPC connection to them.
200: Cluster deleted
If the deletion is successful, the response code will be 200
and the returned response body:
{
"graphdb1.example.com:7300": "DELETED",
"graphdb2.example.com:7300": "DELETED",
"graphdb3.example.com:7300": "DELETED"
}
412: Unreachable nodes
If one or more nodes in the group are not reachable, the delete operation will fail with response code 412
and return:
{
"graphdb1.example.com:7300": "NOT_DELETED",
"graphdb2.example.com:7300": "NO_CONNECTION",
"graphdb3.example.com:7300": "NOT_DELETED"
}
Force parameter¶
The optional force
parameter (false
by default) enables you to bypass this restriction and delete the cluster group on the nodes that are reachable:
When set to
false
, the cluster configuration will not be deleted on any node if at least one of the nodes is unreachable.When set to
true
, the cluster configuration will be deleted only on the reachable nodes, and not be deleted on the unreachable ones.In such a case, the returned response will be
200
:{ "graphdb1.example.com:7300": "DELETED", "graphdb2.example.com:7300": "io.grpc.StatusRuntimeException: UNAVAILABLE: io exception", "graphdb3.example.com:7300": "DELETED" }
In the Workbench, it can be enabled at deletion:

Configure external cluster proxy¶
The external cluster proxy can be deployed separately on its own URL. This way, you do not need to know where all cluster nodes are. Instead, there is a single URL that will always point to the leader node.
The externally deployed proxy will behave like a regular GraphDB instance, including opening and using the Workbench. It will always know which one the leader is and always serve all requests to the current leader.
Note
The external proxy does not require a GraphDB SE/EE license.
Start the external proxy¶
To start the external proxy:
Execute the
cluster-proxy
script in thebin
directory of the GraphDB distribution,Provide the cluster secret,
Provide a GraphDB server HTTP or RPC address to at least one of the nodes in the cluster. You can provide either the HTTP or the RPC address of the node – they are interchangeable. For example:
./bin/cluster-proxy -g http://graphdb1.example.com:7200,http://graphdb2.example.com:7200
A console message will inform you that GraphDB has been started in proxy mode.
Cluster proxy options
The cluster-proxy
script supports the following options:
Option |
Description |
---|---|
-d, --daemon |
Daemonize (run in background) |
-r, --follower-retries <num> |
Number of times to retry a request to a different node in the cluster |
-g, --graphdb-hosts <address> |
List of GraphDB nodes’ HTTP or RPC addresses that are part of the same cluster |
-h, --help |
Print command line options |
p, --pid-file <file> |
Write PID to <file> |
-Dprop |
Set Java system property |
-Xprop |
Set non-standard Java system property |
By default, the proxy will start on port 7200
. To change it, use, for example, -Dgraphdb.connector.port=7201
.
As mentioned above, the default RPC port of the proxy is the HTTP proxy port + 100, which will be 7300
if you have not used a custom HTTP port. You can change the RPC port by setting, for example, -Dgraphdb.rpc.port=7301
or -Dgraphdb.rpc.address=graphdb-proxy.example.com:7301
, e.g.:
./bin/cluster-proxy -Dgraphdb.connector.port=7201 -Dgraphdb.rpc.port=7301 -g http://graphdb1.example.com:7200
Important
Remember to set the -Dgraphdb.auth.token.secret=<cluster-secret>
with the same secret with which you have set up the cluster. If the secrets do not match, some of the proxy functions may appear as if they are working correctly, but will still be misconfigured and you may experience unexpected behavior at any time.
The external proxy works with two types of cluster node lists: static and dynamic.
The static list is provided to the proxy through the
-g/--graphdb-hosts
option of the script. This is a comma-separated list of HTTP or RPC addresses of cluster nodes. At least one address to an active node should be provided. Once the proxy is started, it tries to connect to each of the nodes provided in this list. If it succeeds with one of them, it then builds the dynamic cluster node list:The dynamic cluster node list is built by requesting the cluster’s current status from one of the nodes in the static list. The proxy then subscribes to any changes in the cluster status - leader changes, nodes being added or removed, nodes out of reach, etc. The external proxy always sends all requests to the current cluster leader. If there is no leader at the moment, or the leader is unreachable, requests will go to a random node.
Note
The dynamic cluster node list is reset every time the external proxy is restarted. After each restart, the proxy knows only about the nodes listed in the static node list provided by the -g/--graphdb-hosts
option of the script.
External proxy for cluster running over SSL¶
To set up the external proxy to connect to a cluster over SSL, the same options used to set up GraphDB with security can be provided to the cluster-proxy
script. The most common ones are:
-Dgraphdb.connector.SSLEnabled=true -Dgraphdb.connector.scheme=https -Dgraphdb.connector.secure=true -Dgraphdb.connector.keystoreFile=<path-to-keystore-file> -Dgraphdb.connector.keystorePass=<keystore-pass> -Dgraphdb.connector.keyAlias=<alias-to-key-in-keystore>
For more information on the cluster security options, please see below.
Cluster security¶
Encryption¶
As there is a lot of traffic between the cluster nodes, it is important that it is encrypted. In order to do so, the following requirements need to be met:
SSL/TLS should be enabled on all cluster nodes.
The nodes’ certificates should be trusted by the other nodes in the cluster.
The method of enabling SSL/TLS is already described in Configuring GraphDB instance with SSL. There are no differences when setting up the node to be used as a cluster one.
See how to set up certificate trust between the nodes here.
Access control¶
Authorization and authentication methods in the cluster do not differ from those for a regular GraphDB instance. The rule of thumb is that all nodes in the cluster group must have the same security configuration.
For example, if SSL is enabled on one of the nodes, you must enable it on the other nodes as well; or if you have configured OpenID on one of the nodes, it needs to be configured on the rest of them as well.
Truncate cluster transaction log¶
The truncate log operation is used to free up storage space on all cluster nodes by clearing the current transaction log and removing cached recovery snapshots. It can be triggered with a POST request to the /rest/cluster/truncate-log
endpoint.
Note
The operation requires a healthy cluster, i.e., one where a leader node is present and all follower nodes are IN_SYNC. The reason for this is that the truncate log operation is propagated to each node in the cluster and truncates the log subsequently on each node through the Raft quorum mechanism.
You can truncate the cluster log with the following cURL request:
curl -X POST '<base_url>/rest/cluster/truncate-log'