Architecture & Components

Architecture

GraphDB is packaged as a Storage And Inference Layer (SAIL) for RDF4J and makes extensive use of the features and infrastructure of RDF4J, especially the RDF model, RDF parsers, and query engines.

Inference is performed by the Reasoner (TRREE Engine), where the explicit and inferred statements are stored in highly optimised data structures that are kept in-memory for query evaluation and further inference. The inferred closure is updated through inference at the end of each transaction that modifies the repository.

GraphDB implements The Sail API interface so that it can be integrated with the rest of the RDF4J framework, e.g., the query engines and the web UI. A user application can be designed to use GraphDB directly through the RDF4J SAIL API or via the higher-level functional interfaces. When a GraphDB repository is exposed using the RDF4J HTTP Server, users can manage the repository through the embedded Workbench, the RDF4J Workbench, or other tools integrated with RDF4J.

_images/GraphDB_High-level_architecture.png

GraphDB High-level Architecture

RDF4J

The RDF4J framework is a framework for storing, querying, and reasoning with RDF data. It is implemented in Java by Aduna as an open source project and includes various storage back-ends (memory, file, database), query languages, reasoners, and client-server protocols.

There are essentially two ways to use RDF4J:

  • as a standalone server;
  • embedded in an application as a Java library.

RDF4J supports the W3C SPARQL query language, as well as the most popular RDF file formats and query result formats.

RDF4J offers a JDBC-like user API, streamlined system APIs and a RESTful HTTP interface. Various extensions are available or are being developed by third parties.

RDF4J Architecture

The following is a schematic representation of the RDF4J architecture and a brief overview of the main components.

_images/sesame_architecture.png

The RDF4J architecture

The RDF4J framework is a loosely coupled set of components, where alternative implementations can be easily exchanged. RDF4J comes with a variety of Storage And Inference Layer (SAIL) implementations that a user can select for the desired behavior (in-memory storage, file system, relational database, etc). GraphDB is a plugin SAIL component for the RDF4J framework.

Applications will normally communicate with RDF4J through the Repository API. This provides a sufficient level of abstraction so that the details of particular underlying components remain hidden, i.e., different components can be swapped without requiring modification of the application.

The Repository API has several implementations, one of which uses HTTP to communicate with a remote repository that exposes the Repository API via HTTP.

The Sail API

The Sail API is a set of Java interfaces that support RDF storing, retrieving, deleting, and inferencing. It is used for abstracting from the actual storage mechanism, e.g., an implementation can use relational databases, file systems, in-memory storage, etc. One of its key characteristics is the option for SAIL stacking.

Components

Engine

Query optimizer

The query optimizer attempts to determine the most efficient way to execute a given query by considering the possible query plans. Once queries are submitted and parsed, they are then passed to the query optimizer where optimization occurs. GraphDB allows hints for guiding the query optimizer.

Reasoner (TRREE Engine)

GraphDB is implemented on top of the TRREE engine. TRREE stands for ‘Triple Reasoning and Rule Entailment Engine’. The TRREE performs reasoning based on forward-chaining of entailment rules over RDF triple patterns with variables. TRREE’s reasoning strategy is total materialization, although various optimizations are used. Further details about the rule language can be found in the Reasoning section.

Storage

GraphDB stores all of its data in files in the configured storage directory, usually called storage. It consists of two main indices on statements, POS and PSO, context index CPSO, and literal index, with the latter two being optional.

Entity Pool

The Entity Pool is a key component of the GraphDB storage layer. It converts entities (URIs, blank nodes and literals) to internal IDs (32- or 40-bit integers). It supports transactional behavior, which improves space usage and cluster behavior.

Page Cache

GraphDB’s cache strategy employs the concept of one global cache shared between all internal structures of all repositories, so that you no longer have to configure the cache-memory, tuple-index-memory and predicate-memory, or size every worker and calculate the amount of memory dedicated to it. If one of the repositories is used more at the moment, it naturally gets more slots in the cache.

Connectors

The Connectors provide extremely fast keyword and faceted (aggregation) searches that are typically implemented by an external component or service, but have the additional benefit of staying automatically up-to-date with the GraphDB repository data. GraphDB comes with the following connector implementations:

Workbench

The Workbench is the GraphDB web-based administration tool.