Architecture & components

Architecture

GraphDB is packaged as a Storage and Inference Layer (SAIL) for Sesame and makes extensive use of the features and infrastructure of Sesame, especially the RDF model, RDF parsers and query engines.

Inference is performed by the Reasoner (TRREE Engine), where the explicit and inferred statements are stored in highly-optimised data structures that are kept in-memory for query evaluation and further inference. The inferred closure is updated through inference at the end of each transaction that modifies the repository.

GraphDB implements the The SAIL API interface so that it can be integrated with the rest of the Sesame framework, e.g., the query engines and the web UI. A user application can be designed to use GraphDB directly through the Sesame SAIL API or via the higher-level functional interfaces. When a GraphDB repository is exposed using the Sesame HTTP Server, users can manage the repository through the embedded Workbench, or the Sesame Workbench, or other tools integrated with Sesame.

_images/GraphDB_High-level_architecture.png

GrpahDB High-level Architecture

Sesame

The Sesame framework is a framework for storing, querying and reasoning with RDF data. It is implemented in Java by Aduna as an open source project and includes various storage back-ends (memory, file, database), query languages, reasoners and client-server protocols.

There are essentially two ways to use Sesame:

  • as a standalone server;
  • embedded in an application as a Java library.

Sesame supports the W3C SPARQL query language. It also supports the most popular RDF file formats and query result formats.

Sesame offers a JBDC-like user API, streamlined system APIs and a RESTful HTTP interface. Various extensions are available or are being developed by third parties.

Sesame Architecture

The following is a schematic representation of Sesame’s architecture and a brief overview of the main components.

_images/sesame_architecture.png

The Sesame architecture (reproduced from the Sesame documentation)

The Sesame framework is a loosely coupled set of components, where alternative implementations can be easily exchanged. Sesame comes with a variety of Storage And Inference Layer (SAIL) implementations that a user can select for the desired behaviour (in memory storage, file-system, relational database, etc). GraphDB is a plugin SAIL component for the Sesame framework.

Applications will normally communicate with Sesame through the Repository API. This provides a high enough level of abstraction so that the details of particular underlying components remain hidden, i.e., different components can be swapped without requiring modification of the application.

The Repository API has several implementations, one of which uses HTTP to communicate with a remote repository that exposes the Repository API via HTTP.

The SAIL API

The SAIL API is a set of Java interfaces that support RDF storing, retrieving, deleting and inferencing. It is used for abstracting from the actual storage mechanism, e.g., an implementation can use relational databases, file systems, in-memory storage, etc. Its main characteristics are:

  • flexibility and freedom for optimisations so that huge amounts of data can be handled efficiently on enterprise-level machines;
  • extendability to other RDF-based languages;
  • stacking of SAILs;
  • concurrency control for any type of repository.

Components

Engine

Query optimiser

The query optimiser attempts to determine the most efficient way to execute a given query by considering the possible query plans. Once queries are submitted and parsed, they are then passed to the query optimiser where optimisation occurs. GraphDB allows hints for guiding the query optimiser.

Reasoner (TRREE Engine)

GraphDB is implemented on top of the TRREE engine. TRREE stands for ‘Triple Reasoning and Rule Entailment Engine’. The TRREE performs reasoning based on forward-chaining of entailment rules over RDF triple patterns with variables. TRREE’s reasoning strategy is total materialisation, although various optimisations are used. Further details of the rule language can be found in the Reasoning section.

Storage

GraphDB stores all of its data in files in the configured storage directory, usually called ‘storage’. It consists of two main indices on statements POS and PSO, two context indices PSCO and POCS, literal index and page cache.

Entity Pool

The Entity Pool is a key component of the GraphDB storage layer. It converts entities (URIs, Blank nodes and Literals) to internal IDs (32- or 40-bit integers). It supports transactional behaviour, which improves space usage and cluster behaviour.

Connectors

The Connectors provide extremely fast keyword and faceted (aggregation) searches that are typically implemented by an external component or service, but have the additional benefit of staying automatically up-to-date with the GraphDB repository data. GraphDB comes with the following connector implementations:

Workbench

The Workbench is the default web-based administration tool.