Plugin API

What is the GraphDB Plugin API

The GraphDB Plugin API is a framework and a set of public classes and interfaces that allow developers to extend GraphDB in many useful ways. These extensions are bundled into plugins, which GraphDB discovers during its initialisation phase and then uses to delegate parts of its query processing tasks. The plugins are given low-level access to the GraphDB repository data, which enables them to do their job efficiently. They are discovered via the Java service discovery mechanism, which enables dynamic addition/removal of plugins from the system without having to recompile GraphDB or change any configuration files.

Description of a GraphDB plugin

A GraphDB plugin is a java class that implements the com.ontotext.trree.sdk.Plugin interface. All public classes and interfaces of the plugin API are located in this java package, i.e., com.ontotext.trree.sdk. Here is what the plugin interface looks like in an abbreviated form:

public interface Plugin extends Service {
    void setStatements(Statements statements);

    void setEntities(Entities entities);

    void setOptions(SystemOptions options);

    void setDataDir(File dataDir);

    void setLogger(Logger logger);

    void initialize(InitReason reason);

    void setFingerprint(long fingerprint);

    long getFingerprint();

    void precommit(GlobalViewOnData view);

    void shutdown(ShutdownReason reason);
}

As it derives from the Service interface, the plugin is automatically discovered at run-time, provided that the following conditions also hold:

  • The plugin class is located in the classpath;
  • It is mentioned in a META-INF/services/com.ontotext.trree.sdk.Plugin file in the classpath or in a .jar that is in the classpath. The full class signature has to be written on a separate line in such a file.

The only method introduced by the Service interface is getName(), which provides the plugin’s (service’s) name. This name must be unique within a particular GraphDB repository and it serves as a plugin identifier, which can be used at any time to retrieve a reference to the plugin instance.

There are a lot more functions (interfaces) that a plugin could implement, but these are all optional and are declared in separate interfaces. Implementing any such complementary interface is the means to announce to the system what this particular plugin can do in addition to its mandatory plugin responsibilities. It is then automatically used as appropriate.

The life-cycle of a plugin

A plugin’s life-cycle consists of several phases:

  • Discovery - this phase is executed at repository initialisation. GraphDB searches for all plugin services in the CLASSPATH registered in the META-INF/services/com.ontotext.trree.sdk.Plugins service registry files and constructs a single instance of each plugin found.
  • Configuration - every plugin instance discovered and constructed during the previous phase is then configured. During this phase, plugins are injected with a Logger object, which they use for logging (setLogger(Logger logger)), and the path to their own data directory (setDataDir(File dataDir)), which they create, if needed, and then use to store their data. If a plugin does not need to store anything to the disk, it can skip the creation of its data directory. However, if it needs to use it, it is guaranteed that this directory will be unique and available only to the particular plugin that it was assigned to. The plugins also inject Statements and Entities instances (Repository internals (Statements and Entities)), and a SystemOptions instance, which gives the plugins access to the system-wide configuration options and settings.
  • Initialisation - after a plugin has been configured, the framework calls its initialize(InitReason reason) method so it gets the chance to do whatever initialisation work it needs to do. It is important at this point that the plugin has received all its configuration and low-level access to the repository data (Repository internals (Statements and Entities)).
  • Request - the plugin participates in the request processing. This phase is optional for the plugins. It is divided into several subphases and each plugin can choose to participate in any or none of these. The request phase not only includes the evaluation of, for instance SPARQL queries, but also SPARQL/Update requests and getStatements calls. Here are the subphases of the request phase:
    • Pre-processing - plugins are given the chance to modify the request before it is processed. In this phase, they could also initialise a context object, which will be visible till the end of the request processing (Pre-processing);
    • Pattern interpretation - plugins can choose to provide results for requested statement patterns (Pattern interpretation);
    • Post-processing - before the request results are returned to the client, plugins are given a chance to modify them, filter them out or even insert new results (Post-processing);
  • Shutdown - during repository shutdown, each plugin is prompted to execute its own shutdown routines, free resources, flush data to disk, etc. This must be done in the shutdown(ShutdownReason reason) method.

Repository internals (Statements and Entities)

In order to enable efficient request processing, plugins are given low-level access to the repository data and internals. This is done through the Statements and Entities interfaces.

The Entities interface represents a set of RDF objects (URIs, blank nodes and literals). All such objects are termed entities and are given unique long identifiers. The Entities instance is responsible for resolving these objects from their identifiers and inversely for looking up the identifier of a given entity. Most plugins process entities using their identifiers, because dealing with integer identifiers is a lot more efficient than working with the actual RDF entities they represent. The Entities interface is the single entry point available to plugins for entity management. It supports the addition of new entities, entity replacement, look-up of entity type and properties, resolving entities, listening for entity change events, etc.

It is possible in a GraphDB repository to declare two RDF objects to be equivalent, e.g., by using owl:sameAs optimisation. In order to provide a way to use such declarations, the Entities interface assigns a class identifier to each entity. For newly created entities, this class identifier is the same as the entity identifier. When two entities are declared equivalent, one of them adopts the class identifier of the other, and thus they become members of the same equivalence class. The Entities interface exposes the entity class identifier for plugins to determine which entities are equivalent.

Entities within an Entities instance have a certain scope. There are three entity scopes:

  • Default - entities are persisted on the disk and can be used in statements that are also physically stored on disk. These entities have positive (no-zero) identifiers and are often referred to as physical entities.
  • System - system entities have negative identifiers and are not persisted on the disk. They can be used, for example, for system (or magic) predicates. They are available throughout the whole repository lifetime, but after restart, they have to be re-created again.
  • Request - entities are not persisted on disk and have negative identifiers. They only live in the scope of a particular request and are not visible to other concurrent requests. These entities disappear immediately after the request processing finishes. The request scope is useful for temporary entities such as literal values that are not expected to occur often (e.g. numerical values) and do not appear inside a physical statement.

The Statements interface represents a set of RDF statements, where ‘statement’ means a quadruple of subject, predicate, object and context RDF entity identifiers. Statements can be added, removed and searched for. Additionally, a plugin can subscribe to receive statement event notifications:

  • transaction started;
  • statement added;
  • statement deleted;
  • transaction completed.

An important abstract class, which is related to GraphDB internals, is StatementIterator. It has a method boolean next(), which attempts to scroll the iterator onto the next available statement and returns true only if it succeeds. In case of success, its subject, predicate, object and context fields are initialised with the respective components of the next statement. Furthermore, some properties of each statement are available via the following methods:

  • boolean isReadOnly() - returns true if the statement is in the Axioms part of the rule-file or is imported at initialisation;
  • boolean isExplicit() - returns true if the statement is explicitly asserted;
  • boolean isImplicit() - returns true if the statement is produced by the inferencer (raw statements can be both explicit and implicit).

Here is a brief example that puts Statements, Entities and StatementIterator together, in order to output all literals that are related to a given URI:

// resolve the URI identifier
long id = entities.resolve(SimpleValueFactory.getInstance().createIRI("http://example/uri"));

// retrieve all statements with this identifier in subject position
StatementIterator iter = statements.get(id, 0, 0, 0);
while (iter.next()) {
    // only process literal objects
    if (entities.getType(iter.object) == Entities.Type.LITERAL) {
        // resolve the literal and print out its value
        Value literal = entities.get(iter.object);
        System.out.println(literal.stringValue());
    }
}

Request-processing phases

As already mentioned, a plugin’s interaction with each of the request-processing phases is optional. The plugin declares if it plans to participate in any phase by implementing the appropriate interface.

Pre-processing

A plugin willing to participate in request pre-processing must implement the Preprocessor interface. It looks like this:

public interface Preprocessor {
    RequestContext preprocess(Request request);
}

The preprocess() method receives the request object and returns a RequestContext instance. The Request instance passed as the parameter is a different class instance, depending on the type of the request (e.g., SPARQL/Update or “get statements”). The plugin changes the request object in the necessary way, initialises and returns its context object, which is passed back to it in every other method during the request processing phase. The returned request context may be null, but whatever it is, it is only visible to the plugin that initialises it. It can be used to store data, visible for (and only for) this whole request, e.g. to pass data related to two different statement patterns recognised by the plugin. The request context gives further request processing phases access to the Request object reference. Plugins that opt to skip this phase do not have a request context and are not able to get access to the original Request object.

Pattern interpretation

This is one of the most important phases in the lifetime of a plugin. In fact, most plugins need to participate in exactly this phase. This is the point where request statement patterns need to get evaluated and statement results are returned.

For example, consider the following SPARQL query:

SELECT * WHERE {
    ?s <http://example/predicate> ?o
}

There is just one statement pattern inside this query: ?s <http://example/predicate> ?o. All plugins that have implemented the PatternInterpreter interface (thus declaring that they intend to participate in the pattern interpretation phase) are asked if they can interpret this pattern. The first one to accept it and return results will be used. If no plugin interprets the pattern, it will be looked for using the repository’s physical statements, i.e., the ones persisted on the disk.

Here is the PatternInterpreter interface:

public interface PatternInterpreter {
    double estimate(long subject, long predicate, long object, long context, Statements statements,
            Entities entities, RequestContext requestContext);

    StatementIterator interpret(long subject, long predicate, long object, long context,
            Statements statements, Entities entities, RequestContext requestContext);
}

The estimate() and interpret() methods take the same arguments and are used in the following way:

  • Given a statement pattern (e.g., the one in the SPARQL query above), all plugins that implement PatternInterpreter are asked to interpret() the pattern. The subject, predicate, object and context values are either the identifiers of the values in the pattern or 0, if any of them is an unbound variable. The statements and entities objects represent respectively the statements and entities that are available for this particular request. For instance, if the query contains any FROM <http://some/graph> clauses, the statements object will only provide access to the statements in the defined named graphs. Similarly, the entities object contains entities that might be valid only for this particular request. The plugin’s interpret() method must return a StatementIterator if it intends to interpret this pattern, or null if it refuses.
  • In case the plugin signals that it will interpret the given pattern (returns non-null value), GraphDB’s query optimiser will call the plugin’s estimate() method, in order to get an estimate on how many results will be returned by the StatementIterator returned by interpret(). This estimate does not need to be precise. But the more precise it is, the more likely the optimiser will make an efficient optimisation. There is a slight difference in the values that will be passed to estimate(). The statement components (e.g., subject) might not only be entity identifiers, but they can also be set to 2 special values:
    • Entities.BOUND - the pattern component is said to be bound, but its particular binding is not yet known;
    • Entities.UNBOUND - the pattern component will not be bound. These values must be treated as hints to the estimate() method to provide a better approximation of the result set size, although its precise value cannot be determined before the query is actually run.
  • After the query has been optimised, the interpret() method of the plugin might be called again should any variable become bound due to the pattern reordering applied by the optimiser. Plugins must be prepared to expect different combinations of bound and unbound statement pattern components, and return appropriate iterators.

The requestContext parameter is the value returned by the preprocess() method if one exists, or null otherwise.

The plugin framework also supports the interpretation of an extended type of a list pattern.

Consider the following SPARQL query:

SELECT * WHERE {
    ?s <http://example/predicate> (?o1 ?o2)
}

If a plugin wants to handle such list patterns, it has to implement an interface very similar to the PatternInterpreter interface - ListPatternInterpreter:

public interface ListPatternInterpreter {
    double estimate(long subject, long predicate, long[] objects, long context, Statements statements,
            Entities entities, RequestContext requestContext);

    StatementIterator interpret(long subject, long predicate, long[] objects, long context,
            Statements statements, Entities entities, RequestContext requestContext);
}

It only differs by having multiple objects passed as an array of long, instead of a single long object. The semantics of both methods is equivalent to the one in the basic pattern interpretation case.

Post-processing

There are cases when a plugin would like to modify or otherwise filter the final results of a request. This is where the Postprocessor interface comes into play:

public interface Postprocessor {

    boolean shouldPostprocess(RequestContext requestContext);

    BindingSet postprocess(BindingSet bindingSet, RequestContext requestContext);

    Iterator<BindingSet> flush(RequestContext requestContext);
}

The postprocess() method is called for each binding set that is to be returned to the repository client. This method may modify the binding set and return it, or alternatively, return null, in which case the binding set is removed from the result set. After a binding set is processed by a plugin, the possibly modified binding set is passed to the next plugin having post-processing functionality enabled. After the binding set is processed by all plugins (in the case where no plugin deletes it), it is returned to the client. Finally, after all results are processed and returned, each plugin’s flush() method is called to introduce new binding set results in the result set. These in turn are finally returned to the client.

Update processing

As well as query/read processing, plugins are able to process update operations for statement patterns containing specific predicates. In order to intercept updates, a plugin must implement the UpdateInterpreter interface. During initialisation, the getPredicatesToListenFor is called once by the framework, so that the plugin can indicate which predicates it is interested in.

From then onwards, the plugin framework filters updates for statements using these predicates and notifies the plugin. Filtered updates are not processed further by GraphDB, so if the insert or delete operation must be persisted, the plugin must handle this by using the Statements object passed to it.

/**
 * An interface that must be implemented by the plugins that want to be
 * notified for particular update events. The getPredicatesToListenFor()
 * method should return the predicates of interest to the plugin. This
 * method will be called once only immediately after the plugin has been
 * initialised. After that point the plugin's interpretUpdate() method
 * will be called for each inserted or deleted statement sharing one of the
 * predicates of interest to the plugin (those returned by
 * getPredicatesToListenFor()).
 */
public interface UpdateInterpreter {
    /**
     * Returns the predicates for which the plugin needs to get notified
     * when statement is added or removed and contains the predicates in
     * question
     *
     * @return array of predicates
     */
    long[] getPredicatesToListenFor();

    /**
     * Hook that handles updates that this interpreter is registered for
     *
     * @param subject subject value of the updated statement
     * @param predicate predicate value of the updated statement
     * @param object object value of the updated statement
     * @param context context value of the updated statement
     * @param isAddition true if the statement was added, false if it was removed
     * @param isExplicit true if the updated statement was explicit one
     * @param statements Statements instance that contains the updated statement
     * @param entities Entities instance for the request
     */
    void interpretUpdate(long subject, long predicate, long object, long context,
                         boolean isAddition, boolean isExplicit,
                         Statements statements, Entities entities);
}

Putting it all together: an example plugin

The following example plugin has two responsibilities:

  • It interprets patterns such as ?s <http://example.com/time> ?o and binds their object component to a literal, containing the repository local date and time.
  • If a FROM <http://example.com/time> clause is detected in the query, the result is a single binding set in which all projected variables are bound to a literal containing the repository local date and time.

For the first part, it is clear that the plugin implements the PatternInterpreter interface. A date/time literal is stored as a request-scope entity to avoid cluttering the repository with extra literals.

For the second requirement, the plugin must first take part in the pre-processing phase, in order to inspect the query and detect the FROM clause. Then, the plugin must hook into the post-processing phase where, if the pre-processing phase detects the desired FROM clause, it deletes all query results (in postprocess() and returns a single result (in flush()) containing the binding set specified by the requirements. Again, request-scoped literals are created.

The plugin implementation extends the PluginBase class that provides a default implementation of the Plugin methods:

public class ExamplePlugin extends PluginBase {
    private static final IRI PREDICATE = SimpleValueFactory.getInstance().createIRI("http://example.com/time");
    private long predicateId;

    @Override
    public String getName() {
        return "example";
    }

    @Override
    public void initialize(InitReason reason) {
        predicateId = entities.put(PREDICATE, Entities.Scope.SYSTEM);
    }
}

In this basic implementation, the plugin name is defined and during initialisation, a single system-scope predicate is registered.

Note

It is important not to forget to register the plugin in the META-INF/services/com.ontotext.trree.sdk.Plugin file in the classpath.

The next step is to implement the first of the plugin’s requirements - the pattern interpretation part:

public class ExamplePlugin extends PluginBase implements PatternInterpreter {

    // ...

    @Override
    public StatementIterator interpret(long subject, long predicate, long object, long context,
            Statements statements, Entities entities, RequestContext requestContext) {
        // ignore patterns with predicate different than the one we recognize
        if (predicate != predicateId)
            return null;

        // create the date/time literal
        long literalId = createDateTimeLiteral();

        // return a StatementIterator with a single statement to be iterated
        return StatementIterator.create(subject, predicate, literalId, 0);
    }

    private long createDateTimeLiteral() {
        Value literal = SimpleValueFactory.getInstance().createLiteral(new Date().toString());
        return entities.put(literal, Scope.REQUEST);
    }

    @Override
    public double estimate(long subject, long predicate, long object, long context,
            Statements statements, Entities entities, RequestContext requestContext) {
        return 1;
    }
}

The interpret() method only processes patterns with a predicate matching the desired predicate identifier. Further on, it simply creates a new date/time literal (in the request scope) and places its identifier in the object position of the returned single result. The estimate() method always returns 1, because this is the exact size of the result set.

Finally, to implement the second requirement concerning the interpretation of the FROM clause:

public class ExamplePlugin extends PluginBase implements PatternInterpreter, Preprocessor,
                                                         Postprocessor {
    private static class Context implements RequestContext {
        private Request theRequest;
        private BindingSet theResult;

        public Context(BindingSet result) {
            theResult = result;
        }
        @Override
        public Request getRequest() {
            return theRequest;
        }
        @Override
        public void setRequest(Request request) {
            theRequest = request;
        }
        public BindingSet getResult() {
            return theResult;
        }
    }

    // ...

    @Override
    public RequestContext preprocess(Request request) {
        if (request instanceof QueryRequest) {
            QueryRequest queryRequest = (QueryRequest) request;
            Dataset dataset = queryRequest.getDataset();
            if ((dataset != null && dataset.getDefaultGraphs().contains(PREDICATE))) {
                // create a date/time literal
                long literalId = createDateTimeLiteral();
                Value literal = entities.get(literalId);
                // prepare a binding set with all projected variables set
                // to the date/time literal value
                MapBindingSet result = new MapBindingSet();
                if (queryRequest.getTupleExpr() instanceof Projection) {
                    Projection projection = (Projection) queryRequest.getTupleExpr();
                    for (String bindingName : projection.getBindingNames()) {
                        result.addBinding(bindingName, literal);
                    }
                }
                return new Context(result);
            }
        }
        return null;
    }

    @Override
    public BindingSet postprocess(BindingSet bindingSet, RequestContext requestContext) {
        // if we have found the special FROM clause we filter out all results
        return requestContext != null ? null : bindingSet;
    }

    @Override
    public Iterator<BindingSet> flush(RequestContext requestContext) {
        // if we have found the special FROM clause we return the special binding set
        BindingSet result = ((Context) requestContext).getResult();
        return requestContext != null ? new SingletonIterator<BindingSet>(result) : null;
    }
}

The plugin provides the custom implementation of the RequestContext interface, which can hold a reference to the desired single BindingSet with the date/time literal, bound to every variable name in the query projection. The postprocess() method filters out all results if the requestContext is non-null (i.e., if the FROM clause is detected by preprocess()). Finally, flush() returns a singleton iterator, containing the desired binding set in the required case or does not return anything.

Making a plugin configurable

Plugins are expected to require configuring. There are two ways for GraphDB plugins to receive their configuration. The first practice is to define magic system predicates that can be used to pass some configuration values to the plugin through a query at run-time. This approach is appropriate whenever the configuration changes from one plugin usage scenario to another, i.e., when there are no globally valid parameters for the plugin. However, in many cases the plugin behaviour has to be configured ‘globally’ and then the plugin framework provides a suitable mechanism through the Configurable interface.

A plugin implements the Configurable interface to announce its configuration parameters to the system. This allows it to read parameter values during initialisation from the repository configuration and have them merged with all other repository parameters (accessible through the SystemOptions instance passed during the configuration phase).

This is the Configurable interface:

public interface Configurable {
    public String[] getParameters();
}

The plugin needs to enumerate its configuration parameter names. The example plugin is extended with the ability to define the name of the special predicate it uses. The parameter is called predicate-uri and accepts a URI value.

public class ExamplePlugin extends PluginBase implements PatternInterpreter, Preprocessor,
                                                         Postprocessor, Configurable {
    private static final String DEFAULT_PREDICATE = "http://example.com/time";
    private static final String PREDICATE_PARAM = "predicate-uri";

    // ...

    @Override
    public String[] getParameters() {
        return new String[] { PREDICATE_PARAM };
    }

    // ...

    @Override
    public void initialize(InitReason reason) {
        // get the configured predicate URI, falling back to our default if none was found
        String predicate = options.getParameter(PREDICATE_PARAM, DEFAULT_PREDICATE);

        predicateId = entities.put(SimpleValueFactory.getInstance().createIRI(predicate), Entities.Scope.SYSTEM);
    }

    // ...
}

Now that the plugin parameter has been declared, it can be configured either by adding the http://www.ontotext.com/trree/owlim#predicate-uri parameter to the GraphDB configuration, or by setting a Java system property using -Dpredicate-uri parameter for the JVM running GraphDB.

Accessing other plugins

Plugins can make use of the functionality of other plugins. For example, the Lucene-based full-text search plugin can make use of the rank values provided by the RDFRank plugin, to facilitate query result scoring and ordering. This is not a matter of re-using program code (e.g., in a .jar with common classes), but rather it is about re-using data. The mechanism to do this allows plugins to obtain references to other plugin objects by knowing their names. To achieve this, they only need to implement the PluginDependency interface:

public interface PluginDependency {
    public void setLocator(PluginLocator locator);
}

They are then injected into an instance of the PluginLocator interface (during the configuration phase), which does the actual plugin discovery for them:

public interface PluginLocator {
    public Plugin locate(String name);
}

Having a reference to another plugin is all that is needed to call its methods directly and make use of its services.