Architecture & Components¶
What’s in this document?
Architecture¶
GraphDB is packaged as a SAIL (Storage And Inference Layer) for RDF4J and makes extensive use of the features and infrastructure of RDF4J, especially the RDF model, RDF parsers, and query engines.
Inference is performed by the Reasoner (TRREE Engine), where the explicit and inferred statements are stored in highly optimized data structures that are kept in-memory for query evaluation and further inference. The inferred closure is updated through inference at the end of each transaction that modifies the repository.
GraphDB implements The Sail API interface so that it can be integrated with the rest of the RDF4J framework, e.g., the query engines and the web UI. A user application can be designed to use GraphDB directly through the RDF4J SAIL API or via the higher-level functional interfaces. When a GraphDB repository is exposed using the RDF4J HTTP Server, users can manage the repository through the embedded Workbench, the RDF4J Workbench, or other tools integrated with RDF4J.

GraphDB High-level Architecture
RDF4J¶
The RDF4J framework is a framework for storing, querying, and reasoning with RDF data. It is implemented in Java by Aduna as an open-source project and includes various storage back-ends (memory, file, database), query languages, reasoners, and client-server protocols.
There are essentially two ways to use RDF4J:
as a standalone server;
embedded in an application as a Java library.
RDF4J supports the W3C SPARQL query language, as well as the most popular RDF file formats and query result formats.
RDF4J offers a JDBC-like user API, streamlined system APIs and a RESTful HTTP interface. Various extensions are available or are being developed by third parties.
RDF4J Architecture
The following is a schematic representation of the RDF4J architecture and a brief overview of the main components.

The RDF4J architecture
The RDF4J framework is a loosely coupled set of components, where alternative implementations can be easily exchanged. RDF4J comes with a variety of SAIL implementations that a user can select for the desired behavior (in-memory storage, file system, relational database, etc). GraphDB is a plugin SAIL component for the RDF4J framework.
Applications will normally communicate with RDF4J through the Repository API. This provides a sufficient level of abstraction so that the details of particular underlying components remain hidden, i.e., different components can be swapped without requiring modification of the application.
The Repository API has several implementations, one of which uses HTTP to communicate with a remote repository that exposes the Repository API via HTTP.
The Sail API¶
The Sail API is a set of Java interfaces that support RDF storing, retrieving, deleting, and inferencing. It is used for abstracting from the actual storage mechanism, e.g., an implementation can use relational databases, file systems, in-memory storage, etc. One of its key characteristics is the option for SAIL stacking.
Components¶
Engine¶
Query optimizer¶
The query optimizer attempts to determine the most efficient way to execute a given query by considering the possible query plans. Once queries are submitted and parsed, they are then passed to the query optimizer where optimization occurs. GraphDB allows hints for guiding the query optimizer.
Reasoner (TRREE Engine)¶
GraphDB is implemented on top of the TRREE engine. TRREE stands for ‘Triple Reasoning and Rule Entailment Engine’. The TRREE performs reasoning based on forward-chaining of entailment rules over RDF triple patterns with variables. TRREE’s reasoning strategy is total materialization, although various optimizations are used. Further details about the rule language can be found in the Reasoning section.
Storage¶
GraphDB stores all of its data in files in the configured
storage directory, usually called storage. It consists of two main indexes on statements, POS
and PSO
, context index CPSO
, and literal index, with the latter two being optional.
Entity Pool¶
The Entity Pool is a key component of the GraphDB storage layer. It converts entities (URIs, blank nodes, literals, and RDF-star [formerly RDF*] embedded triples) to internal IDs (32- or 40-bit integers). It supports transactional behavior, which improves space usage and cluster behavior.
Page Cache¶
GraphDB’s cache strategy employs the concept of one global cache shared between all internal structures of all repositories, so that you no longer have to configure the cache-memory
, tuple-index-memory
and predicate-memory
, or size every instance and calculate the amount of memory dedicated to it. If one of the repositories is used more at the moment, it naturally gets more slots in the cache.
Connectors¶
The Connectors provide extremely fast keyword and faceted (aggregation) searches that are typically implemented by an external component or service, but have the additional benefit of staying automatically up-to-date with the GraphDB repository data. GraphDB comes with the following connector implementations:
Solr GraphDB Connector (requires a GraphDB Enterprise license)
Elasticsearch GraphDB Connector (requires a GraphDB Enterprise license)
Additionally, the Kafka GraphDB Connector provides a means to synchronize changes to the RDF model to any Kafka consumer. (requires a GraphDB Enterprise license)