Inferencing

Inference is the derivation of new knowledge from existing knowledge and axioms. In an RDF database, such as GraphDB, inference is used for deducing further knowledge based on existing RDF data and a formal set of inference rules.

Inference in GraphDB

GraphDB supports inference out of the box and provides updates to inferred facts automatically. Facts change all the time and the amount of resources it would take to manually manage updates or rerun the inferencing process would be overwhelming without this capability. This results in improved query speed, data availability and accurate analysis.

Inference uncovers the full power of data modeled with RDF(S) and ontologies. GraphDB will use the data and the rules to infer more facts and thus produce a richer dataset than the one you started with.

GraphDB can be configured via “rulesets” — sets of axiomatic triples and entailment rules — that determine the applied semantics. The implementation of GraphDB relies on a compile stage, during which the rules are compiled into Java source code that is then further compiled into Java bytecode and merged together with the inference engine.

Standard rulesets

The GraphDB inference engine provides full standard-compliant reasoning for RDFS, OWL-Horst, OWL2-RL, and OWL2-QL.

To apply a ruleset, simply choose from the options in the pull-down list when configuring your repository as shown below through GraphDB Workbench:

_images/inference1.png

Custom rulesets

GraphDB also comes with support for custom rulesets that allow for custom reasoning through the same performance optimised inference engine. The rulesets are defined via .pie files.

To load custom rulesets, simply point to the location of your .pie file as shown below:

_images/inference2.png

See how to configure the default inference value setting from the Workbench in Customizing Workbench behavior.

Checking inference derivation with the Proof plugin

The GraphDB proof plugin enables you to find out how a particular statement has been derived by the inferencer, or in other words which rule fired and which premises have been matched to produce that statement.

The plugin is available as open source.

Note

When using the Proof plugin alongside Fine-grained access control, the plugin will obey the FGAC checks and display only statements the user has access to. In practice this means that the explanation for any given inferred statement will be available to the user only if that user has access to all the statements that triggered the inference.

Predicates and namespace

The plugin supports the following predicates:

  • proof:explain: the subject will be bound to the state variable (a unique BNode in request scope). The object is a list with three arguments — the subject, predicate, and object of the statement to be explained.

    When the subject is bound with the id of the state variable, the other predicates can be used to fetch a part of the current solution (rulename, subject, predicate, object, and context of the matching premise).

    Upon re-evaluation, values from the next premise of the rule are used, or we proceed to the next solution to enumerate its premises for each of the rules that derive the statement.

    For brevity of the results, a solution is checked for whether it contains a premise that is equal to the source statement we are exploring. If so, that solution is skipped. This removes matches for self-supporting statements (i.e., when the same statement is also a premise of a rule that derives it).

  • proof:rule: if the subject is bound to the state variable, then the current solution is accessed through the context, and the object is bound to the rule name of the current solution as a Literal. If the source statement is explicit, the Literal “explicit” is bound to the object.

  • proof:subject: the subject is the state variable and the object is bound to the subject of the premise.

  • proof:predicate: the subject is the state variable and the object is bound to the predicate of the premise.

  • proof:object: the subject is the state variable and the object is bound to the object of the premise.

  • proof:context: the subject is the state variable and the object is bound to the context of the premise (or onto:explicit/onto:implicit).

The plugin namespace is http://www.ontotext.com/proof/, and its internal name is proof.

Usage and examples

When creating your repository, make sure to select the OWL-Horst ruleset, as the examples below cover inferences related to the owl:inverseOf and owl:intersectionOf predicates, for which OWL-Horst contains rules.

_images/proof_plugin_owl-horst.png

Simple example with owl:inverseOf

This example will investigate the relevant rule from a ruleset supporting the owl:inverseOf predicate, which looks like the one in the source .pie file:

Id: owl_invOf

  a b c
  b <owl:inverseOf> d
------------------------------------
  c d a

Add to the repository the following data that declares that urn:childOf is inverse property of urn:hasChild, and places a statement relating urn:John urn:childOf urn:Mary in a context named <urn:family>:

INSERT DATA {
    <urn:childOf> owl:inverseOf <urn:hasChild> .
    graph <urn:family> {
     <urn:John> <urn:childOf> <urn:Mary>
    }
}

The following query explains which rule has been triggered to derive the (<urn:Mary> <urn:hasChild> <urn:John>) statement. The arguments to the proof:explain predicate from the plugin are supplied by a VALUES expression for brevity:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX proof: <http://www.ontotext.com/proof/>
SELECT ?rule ?s ?p ?o ?context WHERE {
    VALUES (?subject ?predicate ?object) {(<urn:Mary> <urn:hasChild> <urn:John>)}
    ?ctx proof:explain (?subject ?predicate ?object) .
    ?ctx proof:rule ?rule .
    ?ctx proof:subject ?s .
    ?ctx proof:predicate ?p .
    ?ctx proof:object ?o .
    ?ctx proof:context ?context .
}

The result we get is:

_images/proof_example1.png

If we change the VALUES to:

VALUES (?subject ?predicate ?object) {
    (<urn:John> <urn:childOf> <urn:Mary>)
}

we are getting:

_images/proof_example2.png

If we change the VALUES further to:

VALUES (?subject ?predicate ?object) {
    (<urn:hasChild> owl:inverseOf <urn:childOf>)
}

the result we get is:

_images/proof_example3.png

As you can see, (owl:inverseOf, owl:inverseOf, owl:inverseOf) is implicit, and we can investigate further by altering the VALUES to:

VALUES (?subject ?predicate ?object) {
    (owl:inverseOf owl:inverseOf owl:inverseOf)
}

Where we will get:

_images/proof_example4.png

The .pie code for the related rule is as follows:

Id: owl_invOfBySymProp

  a <rdf:type> <owl:SymmetricProperty>
------------------------------------
  a <owl:inverseOf> a

If we track down the last premise, we will see that another rule supports it. (Note that both rules and the premises are axioms. Currently, the plugin does not check whether something is an axiom.)

Id: owl_SymPropByInverse

  a <owl:inverseOf> a
------------------------------------
  a <rdf:type> <owl:SymmetricProperty>

Example using bindings from other patterns

This more elaborate example demonstrates how to combine the bindings from regular SPARQL statement patterns and explore multiple statements.

We can define a helper JavaScript function that will return the local name of an IRI using the JavaScript functions plugin:

PREFIX jsfn:<http://www.ontotext.com/js#>
INSERT DATA {
    [] jsfn:register '''
    function lname(value) {
     if(value instanceof org.eclipse.rdf4j.model.IRI)
         return value.getLocalName();
     else
         return ""+value;
    }
'''
}

Next, the query will look for statements with ?subject bound to <urn:Mary>, and explain all of them. Note the use of the newly defined function lname() in the projection expression with concat():

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX onto: <http://www.ontotext.com/>
prefix proof: <http://www.ontotext.com/proof/>
PREFIX jsfn: <http://www.ontotext.com/js#>
SELECT (concat('(',jsfn:lname(?subject),',',jsfn:lname(?predicate),',',jsfn:lname(?object),')') as ?stmt)
    ?rule ?s ?p ?o ?context
WHERE {
    bind(<urn:Mary> as ?subject) .
    {?subject ?predicate ?object}

    ?ctx proof:explain (?subject ?predicate ?object) .
    ?ctx proof:rule ?rule .
    ?ctx proof:subject ?s .
    ?ctx proof:predicate ?p .
    ?ctx proof:object ?o .
    ?ctx proof:context ?context .
}

The results look as follows:

_images/proof_example5.png

The first result for (Mary, type, Resource) is derived from the rdf1_rdfs4a_4b_2 rule from the OWL-Horst ruleset which looks like:

Id: rdf1_rdfs4a_4b
    x  a  y
    -------------------------------
    x  <rdf:type>  <rdfs:Resource>
    a  <rdf:type>  <rdfs:Resource>
    y  <rdf:type>  <rdfs:Resource>

More complex example using other data

Let’s further explore the inference engine by adding the data below into the same repository:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
INSERT data {
    <urn:Red> a <urn:Colour> .
    <urn:White> a <urn:Colour> .
    <has:color> a rdf:Property .
    <urn:WhiteThing> a owl:Restriction;
                    owl:onProperty <has:color>;
                    owl:hasValue <urn:White> .
    <urn:RedThing> a owl:Restriction;
                    owl:onProperty <has:color>;
                    owl:hasValue <urn:Red> .
    <has:component> a rdf:Property .
    <urn:Wine> a owl:Restriction;
                    owl:onProperty <has:component>;
                    owl:someValuesFrom <urn:Grape> .
    <urn:RedWine> owl:intersectionOf (<urn:RedThing> <urn:Wine>) .
    <urn:WhiteWine> owl:intersectionOf (<urn:WhiteThing> <urn:Wine>) .
    <urn:Beer> a owl:Restriction;
                    owl:onProperty <has:component>;
                    owl:someValuesFrom <urn:Malt> .
    <urn:PilsenerMalt> a <urn:Malt> .
    <urn:PaleMalt> a <urn:Malt> .
    <urn:WheatMalt> a <urn:Malt> .

    <urn:MerloGrape> a <urn:Grape> .
    <urn:CaberneGrape> a <urn:Grape> .
    <urn:MavrudGrape> a <urn:Grape> .

    <urn:Merlo> <has:component> <urn:MerloGrape> ;
                <has:color> <urn:Red> .
}

It is a simple beverage ontology that uses owl:hasValue, owl:someValuesFrom, and owl:intersectionOf to classify instances based on the values of some of the ontology properties.

It contains:

  • two colors: Red and White;

  • classes of WhiteThings and RedThigs for the items related to has:color property to White and Red colors;

  • classes Wine and Beer for the items related to has:component to instances of Grape and Malt classes;

  • several instances of Grape (MerloGrape, CabernetGrape etc.) and Malt (PilsenerMalt, WheatMalt etc.);

  • classes RedWine and WhiteWine are declared as intersections of Wine with RedThings or WhiteThings with WhiteWine, respectively;

  • finally, we introduce an instance Merlo related to has:component with the value MerloGrape, and whose value for has:color is Red.

The expected inference is that Merlo is classified as RedWine because it is a member of both RedThings (since has:color is related to Red) and Wine (since has:component points to an object that is a member of the class Grape).

If we evaluate:

DESCRIBE <urn:Merlo>

We will get a Turtle document as follows:

<urn:Merlo> a rdfs:Resource, <urn:RedThing>, <urn:RedWine>,<urn:Wine>;
     <has:color> <urn:Red>;
     <has:component> <urn:MerloGrape> .

As you can see, the inferencer correctly derived that Merlo is a member of RedWine.

Now, let’s see how it derived this.

First, we will add some helper JavaScript functions to combine the results in more compact form as literals formed by the local names of the IRI components in the statements. We already introduced the js:lname() function from the previous examples, so we can reuse it to create a stmt() that concatenates several more items into a convenient literal:

PREFIX jsfn:<http://www.ontotext.com/js#>
INSERT DATA {
    [] jsfn:register '''
    function stmt(s, p, o, c) {
     return '('+lname(s)+', '+lname(p)+', '+lname(o)+(c?', '+lname(c):'')+')';
    }
'''
}

We also need a way to refer to a BNode using its label because SPARQL always generates unique BNodes during query evaluation:

PREFIX jsfn:<http://www.ontotext.com/js#>
INSERT DATA {
    [] jsfn:register '''
    function _bnode(value) {
        return org.eclipse.rdf4j.model.impl.SimpleValueFactory.getInstance().createBNode(value);
    }
'''
}

Now, let’s see how the (urn:Merlo rdf:type urn:RedWine) has been derived (note the use of js:stmt() function in the projection of the query). The query will use a SUBSELECT to provide bindings for ?subject, ?predicate, and ?object variables as a convenient way to easily add more statements to be explained by the plugin further.

PREFIX jsfn:<http://www.ontotext.com/js#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
prefix proof: <http://www.ontotext.com/proof/>
SELECT(jsfn:stmt(?subject,?predicate,?object) as ?stmt) ?rule (jsfn:stmt(?s,?p,?o,?context) as ?premise)
WHERE {
    {
        SELECT ?subject ?predicate ?object {
            VALUES (?subject ?predicate ?object) {
                    (<urn:Merlo> rdf:type <urn:RedWine>)
            }
        }
    }
    ?ctx proof:explain (?subject ?predicate ?object) .
    ?ctx proof:rule ?rule .
    ?ctx proof:subject ?s .
    ?ctx proof:predicate ?p .
    ?ctx proof:object ?o .
    ?ctx proof:context ?context .
}

The result looks like this:

_images/proof_example6.png

The first premise is explicit, and comes from the definition of RedWine class which is an owl:intersectionOf of an RDF list (_:node2) that hold the classes that form the intersection. The second premise relates Merlo with a predicate called _allTypes to the node from the intersection node. The inference is derived after applying the following rule:

Id: owl_typeByIntersect_1

  a <onto:_allTypes> b
  c <owl:intersectionOf> b
------------------------------------
  a <rdf:type> c

where a is bound to Merlo and c to RedWine.

Let’s add a (Merlo, _allTypes, _:node2) statement to the list of statements in the SUBSELECT that we used in the query. We will change the SUBSELECT to use a UNION, where for the second part, the ?object is bound to the right BNode that we created by using the helper js:_bnode() function and providing the id as a literal:

SELECT ?subject ?predicate ?object {
        {
            VALUES (?subject ?predicate ?object) {
                (<urn:Merlo> rdf:type <urn:RedWine>)
            }
        } UNION {
            bind (jsfn:_bnode('node2') as ?object)
            VALUES (?subject ?predicate) {
                (<urn:Merlo> <http://www.ontotext.com/_allTypes>)}
        }
}

The results of this evaluation are:

_images/proof_example7.png

We can see that (Merlo, _allTypes, _:node1) is derived by rule owl_typeByIntersect_3:

Id: owl_typeByIntersect_3

     a <rdf:first> b
     d <rdf:type> b
     a <rdf:rest> c
     d <onto:_allTypes> c
   ------------------------------------
     d <onto:_allTypes> a

where we have two explicit and two inferred statements matching the premises (Merlo, _allTypes, _:node2) and (Merlo, type, RedThing).

When we add to the list (Merlo, type, RedThing) first, the SUBSELECT is changed to:

SELECT ?subject ?predicate ?object {
        {
            VALUES (?subject ?predicate ?object) {
                (<urn:Merlo> rdf:type <urn:RedWine>)
                (<urn:Merlo> rdf:type <urn:RedThing>)
            }
        } UNION {
            bind (jsfn:_bnode('node2') as ?object)
            VALUES (?subject ?predicate) {
                (<urn:Merlo> <http://www.ontotext.com/_allTypes>)}
        }
}

The result is:

_images/proof_example8.png

We see that (Merlo, type, RedThing) is derived by matching rule owl_typeByHasVal with all explicit premises:

Id: owl_typeByHasVal

  a <owl:onProperty> b
  a <owl:hasValue> c
  d b c
------------------------------------
  d <rdf:type> a

where a is bound to RedThing and d to Merlo, etc.

Let’s add the other implicit statement that matched the owl_typeByInterset_3 rule (Merlo, _allTypes, _:node3). To do that, we will add another argument to the UNION in the SUBSELECT by introducing the _:node2 using the same js_bnode() function.

The SUBSELECT looks like this:

SELECT ?subject ?predicate ?object {
        {
            VALUES (?subject ?predicate ?object) {
                (<urn:Merlo> rdf:type <urn:RedWine>)
                (<urn:Merlo> rdf:type <urn:RedThing>)
            }
        } UNION {
            bind (jsfn:_bnode('node2') as ?object)
            VALUES (?subject ?predicate) {
                (<urn:Merlo> <http://www.ontotext.com/_allTypes>) }
        } UNION {
        bind (jsfn:_bnode('node3') as ?object)
            VALUES (?subject ?predicate) {
                (<urn:Merlo> <http://www.ontotext.com/_allTypes>) }
        }
}

Evaluating it returns the following:

_images/proof_example9.png

The statement (Merlo, _allTypes, _:node2) was derived by owl_typeByIntersect_2 and the only implicit statement matching as premise is (Merlo, type, Wine).

The owl_typeByIntersect_2 rule looks like this:

Id: owl_typeByIntersect_2

  a <rdf:first> b
  a <rdf:rest> <rdf:nil>
  c <rdf:type> b
------------------------------------
  c <onto:_allTypes> a

where c is bound to Merlo and b to Wine.

Let’s add the (Merlo, type, Wine) to the SUBSELECT we used to explore as another UNION using VALUES, and evaluate the query again:

SELECT ?subject ?predicate ?object {
        {
            values (?subject ?predicate ?object) {
                (<urn:Merlo> rdf:type <urn:RedWine>)
                (<urn:Merlo> rdf:type <urn:RedThing>)
            }
        } UNION {
            bind (jsfn:_bnode('node2') as ?object)
            values (?subject ?predicate) {
                (<urn:Merlo> <http://www.ontotext.com/_allTypes>)}
        } UNION {
 bind (jsfn:_bnode('node3') as ?object)
            VALUES (?subject ?predicate) {
                (<urn:Merlo> <http://www.ontotext.com/_allTypes>)}
        } UNION {
            values (?subject ?predicate ?object) {
                (<urn:Merlo> rdf:type <urn:Wine>)
            }
        }
}

The results have now been expanded, as shown in lines 13-16:

_images/proof_example10.png

These come from rule owl_typeBySomeVal where all premises matching it were explicit. The rule looks like:

Id: owl_typeBySomeVal

  a <rdf:type> b
  c <owl:onProperty> d
  c <owl:someValuesFrom> b
  e d a
------------------------------------
  e <rdf:type> c

where e is bound to Merlo, d to has:component, a to MerloGrape, b to Grape, etc.

In conclusion, while the chain is rather obscure, we were able to find out how the inferencer derived (<urn:Merlo> rdf:type <urn:RedWine>):

  • (Merlo, type, Wine) was derived by rule owl_typeBySomeVal from all explicit statements.

  • (Merlo, type, RedThing) was derived by rule owl_typeByHasVal from explicit statements.

  • (Merlo, _allTypes, _:node2) was derived by rule owl_typeByIntersect_2 with combination of some explicit and the inferred (Merlo, type, Wine).

  • (Merlo, _allTypes, _:node1) was derived by rule owl_typeByIntersect_3 with combination of explicit and inferred (Merlo, type, RedThing) and (Merlo, _allTypes, _:node2).

  • And finally, (Merlo, type, RedWine) was derived by owl_typeByIntersect_1 from explicit (RedWine, intersectionOf, _:node1) and inferred (Merlo, _allTypes, _:node1).

Query time inference with the Provenance plugin

The provenance plugin enables the generation of inference closure from a specific named graph at query time. This is useful in situations when you want to trace what the implicit statements generated from a specific graph are and the axiomatic triples part of the configured ruleset, i.e., the ones inserted with a special predicate sys:schemaTransaction. For more information, check Reasoning.

By default, GraphDB’s forward-chaining inferencer materializes all implicit statements in the default graph. Therefore, it is impossible to trace which graphs these implicit statements are coming from. The provenance plugin provides the opposite approach. With the configured ruleset, the reasoner does forward-chaining over a specific named graph and generates all its implicit statements at query time.

Predicates

The plugin predicates gives you an easy access to the graph, which implicit statements you want to generate. The process is similar to the RDF reification. All plugin’s predicates start with <http://www.ontotext.com/provenance/>:

Plugin predicates

Semantics

http://www.ontotext.com/provenance/derivedFrom

Creates a request scope for the graph with the inference closure

http://www.ontotext.com/provenance/subject

Binds all subjects part of the inference closure

http://www.ontotext.com/provenance/predicate

Binds all predicates part of the inference closure

http://www.ontotext.com/provenance/object

Binds all objects part of the inference closure

Registering the plugin

The plugin is not registered by default.

  1. To register it, start GraphDB with the following parameter:

    ./graphdb -Dregister-plugins=com.ontotext.trree.plugin.provenance.ProvenancePlugin
    
  2. Check the startup log to validate that the plugin has started correctly.

    [INFO ] 2016-11-18 19:47:19,134 [http-nio-7200-exec-2 c.o.t.s.i.PluginManager] Initializing plugin 'provenance'
    

Usage and examples

  1. In the Workbench SPARQL editor, add the following data as schema transaction:

    PREFIX ex: <http://example.com/>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    INSERT data {
        [] <http://www.ontotext.com/owlim/system#schemaTransaction> [] .
    
            ex:BusStop a rdfs:Class .
            ex:SkiResort a rdfs:Class .
            ex:WebCam a rdfs:Class .
            ex:OutdoorWebCam a rdfs:Class .
            ex:Place a rdfs:Class .
    
            ex:BusStop rdfs:subClassOf ex:Place .
            ex:SkiResort rdfs:subClassOf ex:Place .
            ex:WebCam rdfs:subClassOf ex:Place .
            ex:OutdoorWebCam rdfs:subClassOf ex:WebCam .
    
    }
    
  2. Add the following data as normal transaction:

    PREFIX ex: <http://example.com/>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    
    INSERT data {
        GRAPH ex:g1{
            ex:webcam_g1 a ex:OutdoorWebCam .
            ex:webcam_g1 ex:containedIn ex:skiresort .
        }
         GRAPH ex:g1a{
            ex:webcam_g1a a ex:OutdoorWebCam .
            ex:webcam_g1a ex:containedIn ex:skiresort .
        }
        GRAPH ex:g2{
            ex:skiresort a ex:SkiResort .
            ex:skiresort ex:publicTransport ex:busstop .
    
        }
        GRAPH ex:g3{
            ex:busstop a ex:BusStop .
            ex:busstop ex:nearBy ex:skiresort .
        }
    }
    
  3. If we run the following query not using the plugin, it will return solutions over all statements that were inferred during the loading of the data:

    PREFIX ex: <http://example.com/>
    SELECT * {
        ?webCam a ex:WebCam .
        ?webCam ex:containedIn ?skiresort .
        ?busstop ex:nearBy ?skiresort .
    }
    

    The result will have two solutions.

  4. This query showcases the newly introduced Provenance plugin predicate pr:derive. The computed closure is accessible in a dedicated special context whose name is provided as a subject of that predicate pattern. The patterns in the scope of that context are evaluated over the computed closure by the provenance plugin with the selected data. This allows for flexibility of use, as more than one context can be selected to supply data for processing, e.g., pr:derive (ex:g1 ex:g2).

    PREFIX pr: <http://www.ontotext.com/provenance/>
    PREFIX ex: <http://example.com/>
    select * {
        pr:derive1 pr:derive (ex:g1 ex:g2 ex:g3)  .
        GRAPH pr:derive1
        {
            ?webCam a ex:WebCam .
            ?webCam ex:containedIn ?skiresort .
            ?busstop ex:nearBy ?skiresort .
        }
    }
    

    This will return one solution since only the data from ex:g1, ex:g2, and ex:g3 will be used, so that the solution dependent on the data within ex:g1a will not be part of the result.

Note

During evaluation, the inferences and the data are kept in-memory, so the plugin should be used with relatively small sets of statements placed in contexts.