Optimisation of owl:sameAs
¶
What’s in this document?
The OWL same as optimisation uses the OWL owl:sameAs
property to create an equivalence class between two nodes of an RDF graph. An equivalence class has the following properties:
- Reflexivity, i.e. A -> A
- Symmetricity, i.e. if A -> B then B -> A
- Transitivity, i.e. if A -> B and B -> C then A -> C
Instead of using simple rules and axioms for owl:sameAs
(actually 2 axioms that state that it is Symmetric and Transitive), GraphDB offers an effective non-rule implementation, i.e. the owl:sameAs
support is hard-coded. The rules are commented out in the PIE
files and are left only as a reference.
In GraphDB, the equivalence class is represented with a single node, thus avoiding the explosion of all N^2 owl:sameAs
statements and instead, storing the members of the equivalence class in a separate structure. In this way, the ID of the equivalence class can be used as an ordinary node, which eliminates the need to copy statements by subject, predicate and object. So, all these copies are replaced by a single statement.
There is no restriction how to chose this single statement that will represent the class as a whole. It is the first node that enters the class. After creating such a class, all statements with nodes from this class are altered to use the class representative. These statements also participate in the inference.
The equivalence classes may grow when more owl:sameAs
statements containing nodes from the class are added to the repository. Every time you add a new owl:sameAs
statement linking two classes, they merge into a single class.
During query evaluation, GraphDB uses a kind of backward-chaining by enumerating equivalent URIs, thus guaranteeing the completeness of the inference and query results. It takes special care to ensure that this optimization does not hinder the ability to distinguish between explicit and implicit statements.
Removing owl:sameAs
statements¶
When removing owl:sameAs
statements from the repository, some nodes may remain detached from the class they belong to, the class may split into two or more classes, or may disappear altogether. To determine the behaviour of the classes in each particular case, you should track what the original owl:sameAs
statements were and which of them remain in the repository. All statements coming from the user (either through a SPARQL query or through the RDF4J API) are marked as explicit and every statement derived from them during inference is marked as inferred. So, by knowing which are the remaining explicit owl:sameAs
statements, you can rebuild the equivalence classes.
Note
It is not necessary to rebuild all the classes but only the ones that were referred to by the removed owl:sameAs
statements.
When nodes are removed from classes or when classes split or disappear, the new classes (or the removal of classes) yield new representatives. So, statements using the old representatives should be replaced with statements using the new ones. This is also achieved by knowing which statements are explicit. The representative statements (i.e., statements that use representative nodes) are flagged as a special type of statements that may cease to exist after making changes to the equivalence classes. In order to make new representative statements, you should use the explicit statements and the new state of the equivalence classes (e.g., it is not necessary to process all statements when only a single equivalence class has been changed). The specific thing here is that the representative statements, although being volatile, are visible to the SPARQL queries and to the inferencer, whereas the explicit statements that use nodes from the equivalence classes remain invisible and are only used for rebuilding the representative statements.
Disabling the owl:sameAs
support¶
By default, the owl:sameAs
support is enabled in all rulesets except for empty``(without inference). However, disabling the ``owl:sameAs
behaviour may be beneficial in some cases. For example, it can save you time or you may want to visualize your data without the statements generated by owl:sameAs
in queries or inferences of such statements.
To disable owl:sameAs
, use:
- (for individual queries)
FROM onto:disable-sameAs
system graph; - (for the whole repository) the
disable-sameAs
configuration parameter (boolean, defaults to ‘false’). This disables all inference.
Disabling owl:sameAs
by query does not remove the inference that have taken place because of owl:sameAs
.
Consider the following example:
PREFIX owl: <http://www.w3.org/2002/07/owl#>
INSERT DATA {
<urn:A> owl:sameAs <urn:B> .
<urn:A> a <urn:Class1> .
<urn:B> a <urn:Class2> .
}
This leads to <urn:A>
and <urn:B>
being instances of the intersection of the two classes:
PREFIX : <http://test.com/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
INSERT DATA {
:Intersection owl:intersectionOf (<urn:Class1> <urn:Class2>) .
}
If you query what instances the intersection has:
PREFIX : <http://test.com/>
SELECT * {
?s a :Intersection .
}
the response will be: <urn:A>
and <urn:B>
. Using FROM onto:disable-sameAs
returns only the equivalence class representative (e.g., <urn:A>)
. But it does not disable the inference as a whole.
In contrast, when you set up a repository with the disable-sameAs
repository parameter set to true, the inference <urn:A> a :Intersection
will not take place. Then, if you query what instances the intersection has, it will return neither <urn:A>
, nor <urn:B>
.
Apart from this difference, which affects the scope of action, disabling owl:sameAs
both as a repository parameter and a FROM
clause in the query have the same behaviour.
How disable-sameAs
interferes with the different rulesets¶
The following parameters can affect the owl:sameAs
behaviour:
ruleset
–owl:sameAs
support is enabled for all rulesets, except theempty
ruleset. Switching to a non-empty ruleset (e.g., owl-horst-optimized) enables the inference and if it is launched again, the results show all inferred statements, as well as the ones generated byowl:sameAs
. They do not include any<P a rdf:Property>
and<X a rdfs:Resource>
statements (see GraphDB ruleset usage optimisation).disable-sameAs: true + inference
– disables theowl:sameAs
expansion but still shows the other implicit statements. However, these results will be different from the ones retrieved byowl:sameAs + inference
or when there is no inference.FROM onto:disable-sameAs
– including this clause in a query produces different results with different rulesets.FROM onto:explicit
– using only this clause (or withFROM onto:disable-sameAs
) produces the same results as when the inferencer is disabled (as with theempty
ruleset). This means that the ruleset and thedisable-sameAs
parameter do not affect the results.FROM onto:explicit
+FROM onto:implicit
– produces the same results as if both clauses are omitted.FROM onto:implicit
– using this clause returns only the statements derived by the inferencer. Therefore, with theempty
ruleset, it is expected to produce no results.FROM onto:implicit
+FROM onto:disable-sameAs
– shows all inferred statements (except for the ones generated byowl:sameAs
).
The following examples illustrate this behaviour:
Example 1¶
If you use owl:sameAs
with the following statements:
PREFIX : <http://test.com/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
INSERT DATA {
:a :b :c .
:a owl:sameAs :d .
:d owl:sameAs :e .
}
and you want to retrieve data with this query:
PREFIX : <http://test.com/>
PREFIX onto: <http://www.ontotext.com/>
DESCRIBE :a :b :c :d :e
the result is the same as if you query for explicit statements when there is no inference or if you add FROM onto:explicit
.
However, if you enable the inference, you will see a completely different picture. For example, if you use owl-horst-optimized
, disable-sameAs=false
, you will receive the following results:
:a :b :c .
:a owl:sameAs :a .
:a owl:sameAs :d .
:a owl:sameAs :e .
:b a rdf:Property .
:b rdfs:subPropertyOf :b .
:d owl:sameAs :a .
:d owl:sameAs :d .
:d owl:sameAs :e .
:e owl:sameAs :a .
:e owl:sameAs :d .
:e owl:sameAs :e .
:d :b :c .
:e :b :c .
Example 2¶
If you start with the empty
ruleset, then switch to owl-horst-optimized
:
PREFIX sys: <http://www.ontotext.com/owlim/system#>
INSERT DATA {
_:b sys:addRuleset "owl-horst-optimized" .
_:b sys:defaultRuleset "owl-horst-optimized" .
}
and compute the full inference closure:
PREFIX sys: <http://www.ontotext.com/owlim/system#>
INSERT DATA {
_:b sys:reinfer _:b .
}
the same DESCRIBE
query will return:
:a :b :c .
:a owl:sameAs :a .
:a owl:sameAs :d .
:a owl:sameAs :e .
:d owl:sameAs :a .
:d owl:sameAs :d .
:d owl:sameAs :e .
:e owl:sameAs :a .
:e owl:sameAs :d .
:e owl:sameAs :e .
:d :b :c .
:e :b :c .
i.e., without the <P a rdf:Property>
and <P rdfs:subPropertyOf P>
statements.
Example 3¶
If you start with owl-horst-optimized
and set the disable-sameAs
parameter to true or use FROM onto:disable-sameAs
, you will receive:
:a :b :c .
:a owl:sameAs :d .
:b a rdf:Property .
:b rdfs:subPropertyOf :b .
:d owl:sameAs :e .
i.e., the explicit statements + <type Property>
.
Example 4¶
This query:
PREFIX : <http://test.com/>
PREFIX onto: <http://www.ontotext.com/>
DESCRIBE :a :b :c :d :e
FROM onto:implicit
FROM onto:disable-sameAs
yields:
:b a rdf:Property .
:b rdfs:subPropertyOf :b .
because all owl:sameAs
statements and the statements generated from them (<:d :b :c>
, <:e :b :c>
) will not be shown.
Note
The same is achieved with the disable-sameAs
repository parameter set to true. However, if you start with the empty
ruleset and then switch to a non-empty ruleset, the latter query will not return any results. If you start with owl-horst-optimized and then switch to empty
, <type Property>
will persist, i.e., the latter query will return some results.
Example 5¶
If you use named graphs, the results will look differently:
PREFIX : <http://test.com/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
INSERT DATA {
GRAPH :graph {
:a :b :c .
:a owl:sameAs :d .
:d owl:sameAs :e .
}
}
Then the test query will be:
PREFIX : <http://test.com/>
PREFIX onto: <http://www.ontotext.com/>
SELECT DISTINCT *
{
GRAPH ?g {
?s ?p ?o
FILTER (
?s IN (:a, :b, :c, :d, :e, :graph) ||
?p IN (:a, :b, :c, :d, :e, :graph) ||
?o IN (:a, :b, :c, :d, :e, :graph) ||
?g IN (:a, :b, :c, :d, :e, :graph)
)
}
}
If you have started with owl-horst-optimized
, disable-sameAs=false
, you will receive:
graph {
:a :b :c .
:a owl:sameAs :d .
:d owl:sameAs :e .
}
because the statements from the default graph are not automatically included. This is the same as in the DESCRIBE
query, where using both FROM onto:explicit
and FROM onto:implicit
nullifies them.
So, if you want to see all the statements, you should write:
PREFIX : <http://test.com/>
PREFIX onto: <http://www.ontotext.com/>
SELECT DISTINCT *
FROM NAMED onto:explicit
FROM NAMED onto:implicit
{
GRAPH ?g {
?s ?p ?o
FILTER (
?s IN (:a, :b, :c, :d, :e, :graph) ||
?p IN (:a, :b, :c, :d, :e, :graph) ||
?o IN (:a, :b, :c, :d, :e, :graph) ||
?g IN (:a, :b, :c, :d, :e, :graph)
)
}
}
ORDER BY ?g ?s
Note that when querying quads, you should use the FROM NAMED
clause and when querying triples - FROM
. Using FROM NAMED
with triples and FROM
with quads has no effect and the query will return the following:
:graph {
:a :b :c .
:a owl:sameAs :d .
:d owl:sameAs :e .
}
onto:implicit {
:b a rdf:Property .
:b rdfs:subPropertyOf :b .
}
onto:implicit {
:a owl:sameAs :a .
:a owl:sameAs :d .
:a owl:sameAs :e .
:d owl:sameAs :a .
:d owl:sameAs :d .
:d owl:sameAs :e .
:e owl:sameAs :a .
:e owl:sameAs :d .
:e owl:sameAs :e .
}
onto:implicit {
:d :b :c .
:e :b :c .
}
In this case, the explicit statements <:a owl:sameAs :d>
and <:d owl:sameAs :e>
appear also as implicit. They do not appear twice when dealing with triples because the iterators return unique triples. When dealing with quads, however, you can see all statements.
Here, you have the same effects with FROM NAMED onto:explicit
, FROM NAMED onto:impicit
and FROM NAMED onto:disable-sameAs
and the behaviour of the <type Property>
.