GraphDB EE

What’s in this document?

GraphDB EE is a high-performance, clustered semantic repository proven to scale in production environments where simultaneous loading, querying and inferencing of tens of billions of RDF statements occur in real time. It supports automatic failover, synchronisation and load balancing to maximise cluster utilisation.

A GraphDB EE cluster is organised as one or more master nodes that manage one or more worker nodes. Failover and load balancing between worker nodes is automatic. Multiple master nodes ensure continuous cluster performance, even in the event of a master node failure. The cluster deployment can be modified when running, which allows worker nodes to be added during peak times, or released for maintenance, backup, etc.

Structure

The GraphDB EE cluster is made up of two basic node types: masters and workers.

../_images/GraphDB_Cluster0.png

Master node

The master node manages and distributes atomic requests (query evaluations and update transactions) to a set of workers. It does not store any RDF data itself, therefore requires limited resources.

Hint

From an external point of view, a master behaves exactly like any other RDF4J/GraphDB repository that is exposed via the RDF4J HTTP server, but utilises worker nodes (also exposed via a RDF4J HTTP server) to store and manage RDF data. In this way, parallel query execution performance can be increased by having worker nodes answer queries in parallel.

The master node is responsible for:

  • coordinating all read and write operations;
  • ensuring that all worker nodes are synchronised;
  • propagating updates (insert and delete tasks) across all workers and checking updates for inconsistencies;
  • load balancing read requests (query execution tasks) between all available worker nodes;
  • providing a uniform entry point for client software, where the client interacts with the cluster as though it is just a normal RDF4J repository;
  • providing a JMX interface for monitoring and administrating the cluster;
  • automatic cluster re-configuration in the event of failure of one or more worker nodes;
  • user-directed dynamic configuration of the cluster to add/remove worker nodes.

Tip

A cluster can contain more than one master node. Thus, every master monitors the health of all workers and can distribute query execution tasks between them, effectively allowing a cluster to have multiple entry points for queries.

Worker nodes

Worker nodes are based on the same technology as GraphDB SE repositories. They are accessible through the master node via the HTTP protocol of the exported SPARQL endpoint of the RDF4J service.

Worker nodes require massive resources as they are responsible for:

  • Storing all information;
  • Executing all read/write operations.