Clusters

Note

This feature requires a GraphDB Enterprise license.

A key to GraphDB’s scalability is its ability to run clusters. A cluster is a collection of interconnected GraphDB instances, each assigned to an unique URL address and running on a separate machine, that work together as a coherent group. With this arrangement, a GraphDB cluster can survive the failures of some of its members. You can create a cluster through the GraphDB Workbench or by using the REST API. The Workbench also provides a Cluster view, which visualizes the current state of the cluster and its nodes.

The cluster is an essential tool for building a reliable large-scale system and achieving high availability. In other words, a cluster deployment has a high uptime and recovers smoothly from failures, greatly reduces the chance of data loss, and overall, handles and adapts to unexpected situations and scenarios.

In order for the cluster to be able to process INSERT and DELETE operations, more than half of its nodes — in other words, a majority quorum of nodes — should be alive. If 50% or more of the nodes are dead, the cluster will not be able to process INSERT or DELETE operations. You can learn more about this in the Quorum-based replication section of the Advanced cluster documentation.

See also

The Raft consensus algorithm

The GraphDB cluster is based on a coordination mechanism known as the Raft consensus algorithm. Raft allows a collection of machines to work as a coherent group and survive failure of some of its members. It implements consensus by first electing one of the instances as a leader, then giving the leader complete responsibility for managing the log replication that makes the node redundancy possible.

The algorithm implements consensus by first electing a leader and then assigning it full responsibility for managing the replicated log. The leader accepts log entries from clients, replicates them on other servers, and tells servers when it is safe to apply log entries to their state machines. It can tolerate \(n \geq 2m + 1\) failures. Because of this, Raft plays a key role in achieving an enterprise-level highly available cluster deployment in GraphDB.

GraphDB modifies and extends the original Raft protocol to include several improvements, necessary for a higher level of consistency of the cluster. However, the leader election and automatic failover mechanism provided by Raft are implemented without changes.

  • An update is considered committed when majority of GraphDB instances actually commit it locally rather than simply store it in transaction log.

  • Followers also verify with the Leader whether received entry is processed by rest of cluster.

  • An additional OUT_OF_SYNC state has been introduced for faulty nodes that require recovery from other nodes with correct state inside the cluster.

  • Node snapshot replication, leveraging Raft’s log index and replication mechanism, has been introduced to recover out of sync nodes.

Cluster roles

The total amount of leader and follower nodes is usually an odd number in order to tolerate failures. At any given time, each of the nodes is in one of four states:

  • Leader: Usually, there is one leader that handles all client requests. For example, if a client contacts a follower, the follower redirects it to the leader.

  • Follower: A cluster is made up of one leader and all other servers are followers. They are passive, meaning that they issue no requests on their own but simply respond to requests from leaders and candidates.

  • Candidate: This state is used when electing a new leader.

  • Restricted: In this state, the node cannot respond to requests from other nodes and cannot participate in election. A node goes into this state when there is a license issue — for example, an invalid or expired license.

Note

To learn more about the different states a node can take, check the Cluster states section in our documentation.

Internal and external proxy

Internal proxy

In normal working conditions, the cluster nodes have two states — leader and follower. The follower nodes can accept read requests, but cannot write any data. To make it easier for the user to communicate with the cluster, an integrated proxy will redirect all requests (with some exceptions) to the leader node. This ensures that regardless of which cluster node is reached, it can accept all user requests.

However, if a GraphDB cluster node is unavailable, you need to switch to another cluster node that will be on another URL. This means that you need to know all cluster node addresses and make sure that the reached node is healthy and online.

External proxy

For even better usability, the proxy can be deployed separately on its own URL. This way, you do not need to know where all cluster nodes are. Instead, there is a single URL that will always point to the leader node.

The externally deployed proxy will behave like a regular GraphDB instance, including opening and using the Workbench. It will always know which one the leader is and will always serve all requests to the current leader.

High availability and fault-tolerance

A system can be characterized as having high availability if it meets several key criteria such as having a high uptime, smooth recovery, zero data loss, and the ability to handle and adapt to unexpected situations and scenarios. The Raft consensus algorithm helps achieve these criteria and achieve enterprise-grade highly available deployment.

What is consensus?

One of the main problems of distrubuted systems is achieving consensus between its servers — in other words, getting the multiple individual machines that are part of the deployment to agree on values. Once those machines agree on a value, that decision is final. With most typical consensus algorithms, a deployment can continue operating as long as a majority of its servers is available. For example, a cluster of five servers will continue working as long as at least three of its servers are operational. However, if three or more of the servers fail, the cluster will stop making INSERT/DELETE operations, although it will never return an incorrect value.

Consensus algorithms aim to be fault-tolerant, where faults can be classified in two categories:

  • Crash failure: The component abruptly stops functioning and does not resume. The other components can detect the crash and adjust their local decisions in time.

  • Byzantine failure: The component behaves arbitrarily with no absolute conditions. It can send contradictory messages to other components or simply remain silent. It may look normal from outside.

Consensus is usually associated with replicated state machines, a general approach to building fault-tolerant systems. In a replicated state machine, each instance has a log. A consensus algorithm is thus used to agree on the commands in the servers’ logs. The consensus algorithm ensures that each state machine processes the same series of commands and thus produces the same series of results, arriving at the same series of states.

Note

In addition to the Raft consensus algorithm, to prevent a Byzantine failure GraphDB also stops all write procedures while adding a node to the cluster is currently being processed. You can learn more about adding nodes to the cluster in the Manage cluster membership section of the documentation.

High availability deployment

A typical deployment scenario would be a deployment in cloud infrastructure with the ability to deploy GraphDB instances in different regions or zones so that if a region/zone fails, the GraphDB cluster will continue functioning without any issues for the end-user.

To achieve high availability, we recommend to deploy GraphDB instances in different zones/regions while considering the need for a majority quorum in order to be able to accept INSERT/DELETE requests. This means that the deployment should always have over 50% of the instances running.

Another recommendation is to distribute the GraphDB instances so that you do not have exactly half of the GraphDB instances in one zone and the other half in another zone, as this way it would be easy to lose the majority quorum. In such cases, it is better to use three zones.

Cluster group with three nodes

In a cluster with three nodes, we need at least two in order to be able to write data successfully. In this case, the best deployment strategy is to have three GraphDB instances distributed in three zones in the same region. This way, if one zone fails, the other two instances will still form a quorum and the cluster will accept all requests.

Note

Having the instances in different regions may introduce latency.

Cluster group with five nodes

In a cluster with five nodes, we need three nodes for a quorum. If we have three available regions/zones, we can deploy:

  • two instances in zone 1,

  • two instances in zone 2,

  • one instance in zone 3.

If any of the zones fail, we would still have at least three more GraphDB instances that will form a quorum.

Quorum-based replication

The GraphDB cluster relies on quorum-based replication, meaning that the cluster should have over 50% alive nodes in order to be able to execute INSERT/DELETE operations. This ensures that there will always be a majority of GraphDB nodes that always have up-to-date data.

If there are unavailable nodes when an INSERT/DELETE operation is executed, but there are more than 50% alive nodes, the request will be accepted, distributed among the reachable alive nodes, and saved if everything is OK. Once the unavailable nodes come back online, the transactions will be distributed to them as well.

If there are fewer than 50% available nodes, any INSERT/DELETE operations will be rejected.

Query load balancer

In order to achieve maximum efficiency, the GraphDB cluster distributes the incoming read queries to all nodes, prioritizing the ones that have fewer running queries. This ensures the optimal hardware resource utilization of all nodes.

Local consistency

GraphDB supports two types of local consistency: None and Last Committed.

  • None is the default setting and is used when no local consistency is needed. In this mode, the query will be sent to any readable node, without any guarantee of strong consistency. This is suitable for cases where eventual consistency is sufficient or when enforcing strong consistency is too costly.

  • Last Committed is used when strong consistency is required, ensuring that the results reflect the state of the system after all transactions have been committed; however, it could lead to lower scalability as the set of nodes to which a query could be load-balanced is smaller. In this mode, the query will be sent to a readable node that has advanced to the last transaction.

The choice between None and Last Committed depends on the specific requirements and constraints of the application and use case. In general, if query results should always reflect the up-to-date state of the database, Last Committed should be used. Otherwise, None is sufficient.