Cluster basics¶
What’s in this document?
GraphDB EE is a high-performance, clustered semantic repository proven to scale in production environments where simultaneous loading, querying and inferencing of tens of billions of RDF statements occur in real time. It supports automatic failover, synchronisation and load balancing to maximise cluster utilisation.
A GraphDB EE cluster is organised as one or more master nodes that manage one or more worker nodes. Failover and load balancing between worker nodes is automatic. Multiple master nodes ensure continuous cluster performance, even in the event of a master node failure. The cluster deployment can be modified when running, which allows worker nodes to be added during peak times, or released for maintenance, backup, etc.
Structure¶
The GraphDB EE cluster is made up of two basic node types: masters and workers.

Master node¶
The master node manages and distributes atomic requests (query evaluations and update transactions) to a set of workers. It does not store any RDF data itself, therefore requires limited resources.
Hint
From an external point of view, a master behaves exactly like any other RDF4J/GraphDB repository that is exposed via the RDF4J HTTP server, but utilises worker nodes (also exposed via a RDF4J HTTP server) to store and manage RDF data. In this way, parallel query execution performance can be increased by having worker nodes answer queries in parallel.
The master node is responsible for:
- coordinating all read and write operations;
- ensuring that all worker nodes are synchronised;
- propagating updates (insert and delete tasks) across all workers and checking updates for inconsistencies;
- load balancing read requests (query execution tasks) between all available worker nodes;
- providing a uniform entry point for client software, where the client interacts with the cluster as though it is just a normal RDF4J repository;
- providing a JMX interface for monitoring and administrating the cluster;
- automatic cluster re-configuration in the event of failure of one or more worker nodes;
- user-directed dynamic configuration of the cluster to add/remove worker nodes.
Tip
A cluster can contain more than one master node. Thus, every master monitors the health of all workers and can distribute query execution tasks between them, effectively allowing a cluster to have multiple entry points for queries.
Worker nodes¶
Worker nodes are based on the same technology as GraphDB SE repositories. They are accessible through the master node via the HTTP protocol of the exported SPARQL endpoint of the RDF4J service.
Worker nodes require massive resources as they are responsible for:
- Storing all information;
- Executing all read/write operations.
WRITE requests (loading data)¶
Every WRITE
request is sent to all available workers. The master keeps a
log of the recent transactions, while all workers keep a full copy of
the repository data.

READ requests (query evaluations)¶
Every READ
request (SPARQL query) is passed to one of the available worker
nodes. The node is chosen based on the number of queries currently running on it. This ensures that the read
load is spread throughout the cluster.
The available nodes are organised in a priority queue, which is
rearranged after every read
operation. If an error occurs (time-out,
lost connection, etc.), the request is resent to one of the other
available nodes and so on until it is successfully processed.

Updates (transaction logs)¶
Each update is first logged. Then a probe node is selected and the request is sent to it. If the update is successful, control queries are evaluated against it in order to check its consistency with respect to the data it holds. If this control suite passes, the update is forwarded to the rest of the nodes in the cluster.

During the handling of an update, the nodes that do not successfully process it are disabled and scheduled for recovery replication.

Adding a worker node¶
Worker node synchronisation¶
When a master with the full transaction history detects a new worker, it just synchronises the updates with the new worker.

Worker node replication¶
When a master with the recent transaction history detects a new worker, it selects another worker and instructs it to copy its data on the new one.
Note
Masters keep only limited transactions. You can configure how many transaction you want the master to keep and for how long with the parameters
LogMaxSize
& LogMaxDepth
in the JMX console.
LogMaxSize
: the maximum number of stored transactions in the log;LogMaxDepth
: the maximum minutes to store transactions in the log.

Managing out-of-sync worker¶
Master with two worker nodes¶
When a master, which is in a read-write
mode detects an out-of-syncworker,
it goes in a read-only
mode. If it has enough history, it replays the updates
to the out-of-syncworker (synchronisation). Otherwise, it selects another (in-synch)
worker at random and instructs it to replicate directly to the problem worker.
When the worker is back in-synch, the cluster (master) goes back to read
/write
mode.
Master with three worker nodes¶
Tip
Having a cluster with a master and at least three workers ensures that the cluster will remain fully functional in the event of any node failure.
When a master with enough history detects an out-of-syncworker, it replays the updates to the out-of-syncworker (synchronisation). Otherwise, it selects another (in-synch) worker at random and instructs it to replicate directly to the problem worker.

During the replication, the master DOES NOT go in a read-only
mode. It stays
in a read-write
mode and is fully functional.

Master peering¶
The WRITE
requests are divided by the number of the available masters and sent to all of them.
Each master keeps a log of the its own transactions and then they synchronise between each other.
Now the masters are ready to apply the requests to the workers.

Tip
Having a cluster with two peered masters ensures high-availability without any single point of failure.
Cluster backup¶
The master selects a worker with a full transaction log, switches it to read-only and instructs it to create a backup. After finishing the backup, the read-only worker goes in and read-write mode and synchronises with the other workers.

Tip
During the backup operation, the cluster remains in read-write
mode and is able to process updates.