Cluster topologies¶
What’s in this document?
The main factors for choosing a particular cluster topology are the requirements for targeted service availability, the number of concurrent reads and the need for an off-site hot backup. The GraphDB cluster offers good flexibility in configuring different scenarios. We recommend starting from one of the three core topologies.
Tip
Use the ClientAPIs instead of the default client API from RDF4J. It allows handling the failures in a much better way - retry on a next master, retry on failure with a delay, automatically switch to a secondary master and control the cluster’s consistency model per query.
Recommended topologies¶
The following recommended topologies guarantee availability even in case of failures in the cluster.
Single master with three or more workers¶
Single master with three or more workers is the simplest cluster topology, optimised for setups with low latency between the individual workers. Since there is only one master (i.e. single point of failure), often the topology is implemented in a public cloud provider with automatic provisioning. If the master node becomes unresponsive or dies, the cloud infrastructure may stop the machine and mount the master’s file system to another instance and respawn it.
Pros:
- The simplest topology to manage with a single master;
- Guarantees linear scalability of the reads by adding additional workers;
- Automatic failover and cluster recovery if a worker dies;
Cons:
- Master1 is a single point of failure and requires infrastructure for automatic new instance provisioning;
- Optimised for a single data center;
Two masters sharing workers, one of the masters is read-only
¶
The topology is an extension of the single master with multiple workers. It adds secondary read-only master, which eliminates the need for the use of a cloud infrastructure with an automatic new instance provisioning or low service availability. In this topology, the masters will exchange their transaction log before writing any data to the worker nodes. We highly recommend the use of ClientAPIs or smart proxy to ensure automatic failover for the writes.
Pros:
- In case of primary master failure, the other master serves the
read
queries. And it is possible to switch the second master as primary manually; - Requires fewer database instances than two masters with dedicated workers;
- Rolling upgrade is possible with no downtime for both
reads
andwrites
.
Cons:
- No automatic leader promotion for
writes
- there is a manual procedure; - No remote data center for disaster recovery purposes.
See Setting up a cluster two masters with shared workers one is read-only.
Multiple masters with dedicated workers¶
Multiple masters with dedicated workers is optimised topology for data centers located in different regions, which may experience
network delays and temporary package drops. A single master is primary and all other masters are muted
. The primary master is
the only node to accept writes and it synchronises the transaction log with the other masters asynchronously. The asynchrnous
transaction log replication optimises the write speed, by eliminating all delays caused by network latency. During a normal
operation, all remote masters should lag not more than few latest transactions. In a case of a primary master or data center
failure, one of the muted masters may take its role after manually switching its flag from muted
to normal
.
Pros:
- Multi-data center failover;
- Low latency of the updates with asynchronous transaction log synchronisations;
Cons:
- No automatic leader promotion for
writes
- there is a manual procedure; - High hardware costs compared to the other topologies;
Not recommended topologies¶
The not recommended topologies do not guarantee high-availability of the cluster and failover. It is highly recommended to add more
nodes to prevent possible downtime (rejected reads
/writes
) in case of failures.
Master with a single worker¶
If any of the two nodes die, you have to rebuild the cluster somehow - from a backup or dump.
Master with 2 workers¶
If any of the following occurs, the cluster is at risk of downtime for the writes:
- A backup is started and then the other worker dies;
- If one of the worker goes out of sync, it is impossible to replicate it without downtime;
- If one of the worker goes OFF, you cannot join other workers in the cluster.