Architecture & components¶
What’s in this document?
Architecture¶
GraphDB is packaged as a Storage and Inference Layer (SAIL) for RDF4J and makes extensive use of the features and infrastructure of RDF4J, especially the RDF model, RDF parsers and query engines.
Inference is performed by the Reasoner (TRREE Engine), where the explicit and inferred statements are stored in highly-optimised data structures that are kept in-memory for query evaluation and further inference. The inferred closure is updated through inference at the end of each transaction that modifies the repository.
GraphDB implements the The Sail API interface so that it can be integrated with the rest of the RDF4J framework, e.g., the query engines and the web UI. A user application can be designed to use GraphDB directly through the RDF4J SAIL API or via the higher-level functional interfaces. When a GraphDB repository is exposed using the RDF4J HTTP Server, users can manage the repository through the embedded Workbench, or the RDF4J Workbench, or other tools integrated with RDF4J.

GraphDB High-level Architecture
RDF4J¶
The RDF4J framework is a framework for storing, querying and reasoning with RDF data. It is implemented in Java by Aduna as an open source project and includes various storage back-ends (memory, file, database), query languages, reasoners and client-server protocols.
There are essentially two ways to use RDF4J:
- as a standalone server;
- embedded in an application as a Java library.
RDF4J supports the W3C SPARQL query language. It also supports the most popular RDF file formats and query result formats.
RDF4J offers a JDBC-like user API, streamlined system APIs and a RESTful HTTP interface. Various extensions are available or are being developed by third parties.
RDF4J Architecture
The following is a schematic representation of RDF4J’s architecture and a brief overview of the main components.

The RDF4J architecture
The RDF4J framework is a loosely coupled set of components, where alternative implementations can be easily exchanged. RDF4J comes with a variety of Storage And Inference Layer (SAIL) implementations that a user can select for the desired behaviour (in memory storage, file-system, relational database, etc). GraphDB is a plugin SAIL component for the RDF4J framework.
Applications will normally communicate with RDF4J through the Repository API. This provides a high enough level of abstraction so that the details of particular underlying components remain hidden, i.e., different components can be swapped without requiring modification of the application.
The Repository API has several implementations, one of which uses HTTP to communicate with a remote repository that exposes the Repository API via HTTP.
The Sail API¶
The Sail API is a set of Java interfaces that support RDF storing, retrieving, deleting and inferencing. It is used for abstracting from the actual storage mechanism, e.g., an implementation can use relational databases, file systems, in-memory storage, etc. Its main characteristics are:
- flexibility and freedom for optimisations so that huge amounts of data can be handled efficiently on enterprise-level machines;
- extendability to other RDF-based languages;
- stacking of SAILs;
- concurrency control for any type of repository.
Components¶
Engine¶
Query optimiser¶
The query optimiser attempts to determine the most efficient way to execute a given query by considering the possible query plans. Once queries are submitted and parsed, they are then passed to the query optimiser where optimisation occurs. GraphDB allows hints for guiding the query optimiser.
Reasoner (TRREE Engine)¶
GraphDB is implemented on top of the TRREE engine. TRREE stands for ‘Triple Reasoning and Rule Entailment Engine’. The TRREE performs reasoning based on forward-chaining of entailment rules over RDF triple patterns with variables. TRREE’s reasoning strategy is total materialisation, although various optimisations are used. Further details of the rule language can be found in the Reasoning section.
Storage¶
GraphDB stores all of its data in files in the configured
storage directory, usually called ‘storage’. It consists of two main indices on statements POS
and PSO
, context index CPSO
, literal index and page cache.
Entity Pool¶
The Entity Pool is a key component of the GraphDB storage layer. It converts entities (URIs, Blank nodes and Literals) to internal IDs (32- or 40-bit integers). It supports transactional behaviour, which improves space usage and cluster behaviour.
Connectors¶
The Connectors provide extremely fast keyword and faceted (aggregation) searches that are typically implemented by an external component or service, but have the additional benefit of staying automatically up-to-date with the GraphDB repository data. GraphDB comes with the following connector implementations: