Cluster Topologies¶
What’s in this document?
The main factors for choosing a particular cluster topology are the requirements for targeted service availability, the number of concurrent reads, and the need for an off-site hot backup. The GraphDB cluster offers good flexibility in configuring different scenarios. We recommend starting from one of the three core topologies.
Tip
Use the ClientAPIs instead of the default client API from RDF4J. It allows handling the failures in a much better way - retry on a next master, retry on failure with a delay, automatically switch to a secondary master, and control the cluster’s consistency model per query.
Recommended topologies¶
The following recommended topologies guarantee availability even in case of failures in the cluster:
Single master with three or more workers¶
Single master with three or more workers is the simplest cluster topology, optimized for setups with low latency between the individual workers. Since there is only one master (i.e., single point of failure), often the topology is implemented in a public cloud provider with automatic provisioning. If the master node becomes unresponsive or dies, the cloud infrastructure may stop the machine and mount the master’s file system to another instance and respawn it.
Pros:
The simplest topology to manage with a single master;
Guarantees linear scalability of the reads by adding additional workers;
Automatic failover and cluster recovery if a worker dies.
Cons:
Master1 is a single point of failure and requires infrastructure for automatic new instance provisioning;
Optimized for a single data center.
Two masters sharing workers, one of the masters is read-only¶
The topology is an extension of the single master with multiple workers. It adds a secondary read-only master, which eliminates the need for the use of a cloud infrastructure with an automatic new instance provisioning or low service availability. In this topology, the masters will exchange their transaction log before writing any data to the worker nodes. We highly recommend the use of ClientAPIs or smart proxy to ensure automatic failover for the writes.
Pros:
In case of primary master failure, the other master serves the read queries. It is also possible to manually switch the second master as primary;
Requires fewer database instances than two masters with dedicated workers;
Rolling upgrade is possible with no downtime for both reads and writes.
Cons:
No automatic leader promotion for writes - there is a manual procedure;
No remote data center for disaster recovery purposes.
See Setting up a cluster two masters with shared workers one is read-only.
Multiple masters with dedicated workers¶
Multiple masters with dedicated workers is an optimized topology for data centers located in different regions, which may experience network delays and temporary package drops. A single master is primary and all other masters are muted. The primary master is the only node to accept writes, and it synchronizes the transaction log with the other masters asynchronously. The asynchronous transaction log replication optimizes the write speed by eliminating all delays caused by network latency. During normal operation, all remote masters should lag by no more than a few latest transactions. In a case of a primary master or data center failure, one of the muted masters may take over its role after manually switching its flag from muted to normal.
Pros:
Multi-data center failover;
Low latency of the updates with asynchronous transaction log synchronizations.
Cons:
No automatic leader promotion for writes - there is a manual procedure;
High hardware costs compared to the other topologies.
Not recommended topologies for a production system¶
The following topologies are possible, but not recommended for a production system because they cannot guarantee high availability. In the event of a node failure or restart, the cluster may reject read or write operations.
Master with a single worker¶
The setup is suitable only for testing purposes. During a backup operation, the cluster will remain in read-only state. There is no redundancy of any of the cluster nodes, which increases the chances of a failure without adding any value.
Master with two workers¶
Although two redundant workers exist, the cluster topology prevents:
Performing a backup if one of the workers dies without temporarily losing the write operations;
Recovering the cluster by joining a new worker after an existing worker dies without temporarily losing the write operations.