Optimisation of owl:sameAs

The OWL same as optimisation uses the OWL owl:sameAs property to create an equivalence class between two nodes of an RDF graph. An equivalence class has the following properties:

  • Reflexivity, i.e. A -> A
  • Symmetricity, i.e. if A -> B then B -> A
  • Transitivity, i.e. if A -> B and B -> C then A -> C

Instead of using simple rules and axioms for owl:sameAs (actually 2 axioms that state that it is Symmetric and Transitive), GraphDB offers an effective non-rule implementation, i.e. the owl:sameAs support is hard-coded. The rules are commented out in the PIE files and are left only as a reference.

In GraphDB, the equivalence class is represented with a single node, thus avoiding the explosion of all N^2 owl:sameAs statements and instead, storing the members of the equivalence class in a separate structure. In this way, the ID of the equivalence class can be used as an ordinary node, which eliminates the need to copy statements by subject, predicate and object. So, all these copies are replaced by a single statement.

There is no restriction how to chose this single statement that will represent the class as a whole. It is the first node that enters the class. After creating such a class, all statements with nodes from this class are altered to use the class representative. These statements also participate in the inference.

The equivalence classes may grow when more owl:sameAs statements containing nodes from the class are added to the repository. Every time you add a new owl:sameAs statement linking two classes, they merge into a single class.

During query evaluation, GraphDB uses a kind of backward-chaining by enumerating equivalent URIs, thus guaranteeing the completeness of the inference and query results. It takes special care to ensure that this optimization does not hinder the ability to distinguish between explicit and implicit statements.

Removing owl:sameAs statements

When removing owl:sameAs statements from the repository, some nodes may remain detached from the class they belong to, the class may split into two or more classes, or may disappear altogether. To determine the behaviour of the classes in each particular case, you should track what the original owl:sameAs statements were and which of them remain in the repository. All statements coming from the user (either through a SPARQL query or through the RDF4J API) are marked as explicit and every statement derived from them during inference is marked as inferred. So, by knowing which are the remaining explicit owl:sameAs statements, you can rebuild the equivalence classes.

Note

It is not necessary to rebuild all the classes but only the ones that were referred to by the removed owl:sameAs statements.

When nodes are removed from classes or when classes split or disappear, the new classes (or the removal of classes) yield new representatives. So, statements using the old representatives should be replaced with statements using the new ones. This is also achieved by knowing which statements are explicit. The representative statements (i.e., statements that use representative nodes) are flagged as a special type of statements that may cease to exist after making changes to the equivalence classes. In order to make new representative statements, you should use the explicit statements and the new state of the equivalence classes (e.g., it is not necessary to process all statements when only a single equivalence class has been changed). The specific thing here is that the representative statements, although being volatile, are visible to the SPARQL queries and to the inferencer, whereas the explicit statements that use nodes from the equivalence classes remain invisible and are only used for rebuilding the representative statements.

Disabling the owl:sameAs support

By default, the owl:sameAs support is enabled in all rulesets except for empty``(without inference). However, disabling the ``owl:sameAs behaviour may be beneficial in some cases. For example, it can save you time or you may want to visualize your data without the statements generated by owl:sameAs in queries or inferences of such statements.

To disable owl:sameAs, use:

  • (for individual queries) FROM onto:disable-sameAs system graph;
  • (for the whole repository) the disable-sameAs configuration parameter (boolean, defaults to ‘false’). This disables all inference.

Disabling owl:sameAs by query does not remove the inference that have taken place because of owl:sameAs.

Consider the following example:

PREFIX owl: <http://www.w3.org/2002/07/owl#>

INSERT DATA {
  <urn:A> owl:sameAs <urn:B> .
  <urn:A> a <urn:Class1> .
  <urn:B> a <urn:Class2> .
}

This leads to <urn:A> and <urn:B> being instances of the intersection of the two classes:

PREFIX : <http://test.com/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

INSERT DATA {
  :Intersection owl:intersectionOf (<urn:Class1> <urn:Class2>) .
}

If you query what instances the intersection has:

PREFIX : <http://test.com/>

SELECT * {
  ?s a :Intersection .
}

the response will be: <urn:A> and <urn:B>. Using FROM onto:disable-sameAs returns only the equivalence class representative (e.g., <urn:A>). But it does not disable the inference as a whole.

In contrast, when you set up a repository with the disable-sameAs repository parameter set to true, the inference <urn:A> a :Intersection will not take place. Then, if you query what instances the intersection has, it will return neither <urn:A>, nor <urn:B>.

Apart from this difference, which affects the scope of action, disabling owl:sameAs both as a repository parameter and a FROM clause in the query have the same behaviour.

How disable-sameAs interferes with the different rulesets

The following parameters can affect the owl:sameAs behaviour:

  • rulesetowl:sameAs support is enabled for all rulesets, except the empty ruleset. Switching to a non-empty ruleset (e.g., owl-horst-optimized) enables the inference and if it is launched again, the results show all inferred statements, as well as the ones generated by owl:sameAs. They do not include any <P a rdf:Property> and <X a rdfs:Resource> statements (see GraphDB ruleset usage optimisation).
  • disable-sameAs: true + inference – disables the owl:sameAs expansion but still shows the other implicit statements. However, these results will be different from the ones retrieved by owl:sameAs + inference or when there is no inference.
  • FROM onto:disable-sameAs – including this clause in a query produces different results with different rulesets.
  • FROM onto:explicit – using only this clause (or with FROM onto:disable-sameAs) produces the same results as when the inferencer is disabled (as with the empty ruleset). This means that the ruleset and the disable-sameAs parameter do not affect the results.
  • FROM onto:explicit + FROM onto:implicit – produces the same results as if both clauses are omitted.
  • FROM onto:implicit – using this clause returns only the statements derived by the inferencer. Therefore, with the empty ruleset, it is expected to produce no results.
  • FROM onto:implicit + FROM onto:disable-sameAs – shows all inferred statements (except for the ones generated by owl:sameAs).

The following examples illustrate this behaviour:

Example 1

If you use owl:sameAs with the following statements:

PREFIX : <http://test.com/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

INSERT DATA {
  :a :b :c .
  :a owl:sameAs :d .
  :d owl:sameAs :e .
}

and you want to retrieve data with this query:

PREFIX : <http://test.com/>
PREFIX onto: <http://www.ontotext.com/>

DESCRIBE :a :b :c :d :e

the result is the same as if you query for explicit statements when there is no inference or if you add FROM onto:explicit.

However, if you enable the inference, you will see a completely different picture. For example, if you use owl-horst-optimized, disable-sameAs=false, you will receive the following results:

:a :b :c .
:a owl:sameAs :a .
:a owl:sameAs :d .
:a owl:sameAs :e .
:b a rdf:Property .
:b rdfs:subPropertyOf :b .
:d owl:sameAs :a .
:d owl:sameAs :d .
:d owl:sameAs :e .
:e owl:sameAs :a .
:e owl:sameAs :d .
:e owl:sameAs :e .
:d :b :c .
:e :b :c .

Example 2

If you start with the empty ruleset, then switch to owl-horst-optimized:

PREFIX sys: <http://www.ontotext.com/owlim/system#>

INSERT DATA {
  _:b sys:addRuleset "owl-horst-optimized" .
  _:b sys:defaultRuleset "owl-horst-optimized" .
}

and compute the full inference closure:

PREFIX sys: <http://www.ontotext.com/owlim/system#>

INSERT DATA {
  _:b sys:reinfer _:b .
}

the same DESCRIBE query will return:

:a :b :c .
:a owl:sameAs :a .
:a owl:sameAs :d .
:a owl:sameAs :e .
:d owl:sameAs :a .
:d owl:sameAs :d .
:d owl:sameAs :e .
:e owl:sameAs :a .
:e owl:sameAs :d .
:e owl:sameAs :e .
:d :b :c .
:e :b :c .

i.e., without the <P a rdf:Property> and <P rdfs:subPropertyOf P> statements.

Example 3

If you start with owl-horst-optimized and set the disable-sameAs parameter to true or use FROM onto:disable-sameAs, you will receive:

:a :b :c .
:a owl:sameAs :d .
:b a rdf:Property .
:b rdfs:subPropertyOf :b .
:d owl:sameAs :e .

i.e., the explicit statements + <type Property>.

Example 4

This query:

PREFIX : <http://test.com/>
PREFIX onto: <http://www.ontotext.com/>

DESCRIBE :a :b :c :d :e
FROM onto:implicit
FROM onto:disable-sameAs

yields:

:b a rdf:Property .
:b rdfs:subPropertyOf :b .

because all owl:sameAs statements and the statements generated from them (<:d :b :c>, <:e :b :c>) will not be shown.

Note

The same is achieved with the disable-sameAs repository parameter set to true. However, if you start with the empty ruleset and then switch to a non-empty ruleset, the latter query will not return any results. If you start with owl-horst-optimized and then switch to empty, <type Property> will persist, i.e., the latter query will return some results.

Example 5

If you use named graphs, the results will look differently:

PREFIX : <http://test.com/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

INSERT DATA {
  GRAPH :graph {
    :a :b :c .
    :a owl:sameAs :d .
    :d owl:sameAs :e .
  }
}

Then the test query will be:

PREFIX : <http://test.com/>
PREFIX onto: <http://www.ontotext.com/>

SELECT DISTINCT *
{
  GRAPH ?g {
    ?s ?p ?o
    FILTER (
      ?s IN (:a, :b, :c, :d, :e, :graph) ||
      ?p IN (:a, :b, :c, :d, :e, :graph) ||
      ?o IN (:a, :b, :c, :d, :e, :graph) ||
      ?g IN (:a, :b, :c, :d, :e, :graph)
    )
  }
}

If you have started with owl-horst-optimized, disable-sameAs=false, you will receive:

graph {
  :a :b :c .
  :a owl:sameAs :d .
  :d owl:sameAs :e .
}

because the statements from the default graph are not automatically included. This is the same as in the DESCRIBE query, where using both FROM onto:explicit and FROM onto:implicit nullifies them.

So, if you want to see all the statements, you should write:

PREFIX : <http://test.com/>
PREFIX onto: <http://www.ontotext.com/>

SELECT DISTINCT *
FROM NAMED onto:explicit
FROM NAMED onto:implicit
{
  GRAPH ?g {
    ?s ?p ?o
    FILTER (
      ?s IN (:a, :b, :c, :d, :e, :graph) ||
      ?p IN (:a, :b, :c, :d, :e, :graph) ||
      ?o IN (:a, :b, :c, :d, :e, :graph) ||
      ?g IN (:a, :b, :c, :d, :e, :graph)
    )
  }
}
ORDER BY ?g ?s

Note that when querying quads, you should use the FROM NAMED clause and when querying triples - FROM. Using FROM NAMED with triples and FROM with quads has no effect and the query will return the following:

:graph {
  :a :b :c .
  :a owl:sameAs :d .
  :d owl:sameAs :e .
}
onto:implicit {
  :b a rdf:Property .
  :b rdfs:subPropertyOf :b .
}
onto:implicit {
  :a owl:sameAs :a .
  :a owl:sameAs :d .
  :a owl:sameAs :e .
  :d owl:sameAs :a .
  :d owl:sameAs :d .
  :d owl:sameAs :e .
  :e owl:sameAs :a .
  :e owl:sameAs :d .
  :e owl:sameAs :e .
}
onto:implicit {
  :d :b :c .
  :e :b :c .
}

In this case, the explicit statements <:a owl:sameAs :d> and <:d owl:sameAs :e> appear also as implicit. They do not appear twice when dealing with triples because the iterators return unique triples. When dealing with quads, however, you can see all statements.

Here, you have the same effects with FROM NAMED onto:explicit, FROM NAMED onto:impicit and FROM NAMED onto:disable-sameAs and the behaviour of the <type Property>.