Delete optimizations

What’s in this document?

GraphDB’s inference policy is based on materialization, where implicit statements are inferred from explicit statements as soon as they are inserted into the repository, using the specified semantics ruleset. This approach has the advantage of achieving query answering very quickly, since no inference needs to be done at query time.

However, no justification information is stored for inferred statements, therefore deleting a statement normally requires a full re-computation of all inferred statements. This can take a very long time for large datasets.

GraphDB uses a special technique for handling the deletion of explicit statements and their inferences, called “smooth delete”. It allows fast delete operations as well as ensures that schemas can be changed when necessary.

The algorithm

The algorithm for identifying and removing the inferred statements that can no longer be derived by the explicit statements that have been deleted, is as follows:

  1. Use forward chaining to determine what statements can be inferred from the statements marked for deletion.
  2. Use backward chaining to see if these statements are still supported by other means.
  3. Delete explicit statements and the no longer supported inferred statements.

Note

We recommend that you mark the visited statements as read-only. Otherwise, as almost all delete operations follow inference paths that touch schema statements, which then lead to almost all other statements in the repository, the smooth delete can take a very long time. However, since a read-only statement cannot be deleted, there is no reason to find what statements are inferred from it (such inferred statements might still get deleted, but they will be found by following other inference paths).

Statements are marked as read-only if they occur in the Axioms section of the ruleset files (standard or custom) or are loaded at initialization time via the imports configuration parameter.

Note

When using ‘smooth delete’, we recommend that you load all ontology/schema/vocabulary statements using the imports configuration parameter.

Example

Consider the following statements:

Schema:
<foaf:name> <rdfs:domain> <owl:Thing> .
<MyClass> <rdfs:subClassOf> <owl:Thing> .

Data:
<wayne_rooney> <foaf:name> "Wayne Rooney" .
<Reviewer40476> <rdf:type> <MyClass> .
<Reviewer40478> <rdf:type> <MyClass> .
<Reviewer40480> <rdf:type> <MyClass> .
<Reviewer40481> <rdf:type> <MyClass> .

When using the owl-horst ruleset the removal of the statement:

<wayne_rooney> <foaf:name> "Wayne Rooney"

will cause the following sequence of events:

rdfs2:
x a y - (x=<wayne_rooney>, a=foaf:name, y="Wayne Rooney")
a rdfs:domain z (a=foaf:name, z=owl:Thing)
-----------------------
x rdf:type z  - The inferred statement [<wayne_rooney> rdf:type owl:Thing] is to be removed.
rdfs3:
x a u - (x=<wayne_rooney>, a=rdf:type, u=owl:Thing)
a rdfs:range z (a=rdf:type, z=rdfs:Class)
-----------------------
u rdf:type z - The inferred statement [owl:Thing rdf:type rdfs:Class] is to be removed.
rdfs8_10:
x rdf:type rdfs:Class - (x=owl:Thing)
-----------------------
x rdfs:subClassOf x - The inferred statement [owl:Thing rdfs:subClassOf owl:Thing] is to be removed.
proton_TransitiveOver:
y q z - (y=owl:Thing, q=rdfs:subClassOf, z=owl:Thing)
p protons:transitiveOver q - (p=rdf:type, q=rdfs:subClassOf)
x p y - (x=[<Reviewer40476>, <Reviewer40478>, <Reviewer40480>, <Reviewer40481>], p=rdf:type, y=owl:Thing)
-----------------------
x p z - The inferred statements [<Reviewer40476> rdf:type owl:Thing], etc., are to be removed.

Statements such as [<Reviewer40476> rdf:type owl:Thing] exist because of the statements [<Reviewer40476> rdf:type <MyClass>] and [<MyClass> rdfs:subClassOf owl:Thing].

In large datasets, there are typically millions of statements [X rdf:type owl:Thing], and they are all visited by the algorithm.

The [X rdf:type owl:Thing] statements are not the only problematic statements considered for removal. Every class that has millions of instances leads to similar behavior.

One check to see if a statement is still supported requires about 30 query evaluations with owl-horst, hence the slow removal.

If [owl:Thing rdf:type owl:Class] is marked as an axiom (because it is derived by statements from the schema, which must be axioms), then the process stops when reaching this statement. So, the schema (the system statements) must necessarily be imported through the imports configuration parameter in order to mark the schema statements as axioms.

Schema transactions

As mentioned above, ontologies and schemas imported at initialization time using the imports configuration parameter configuration parameter are flagged as read-only. However, there are times when it is necessary to change a schema. This can be done inside a ‘system transaction’.

The user instructs GraphDB that the transaction is a system transaction by including a dummy statement with the special schemaTransaction predicate, i.e.:

_:b1 <http://www.ontotext.com/owlim/system#schemaTransaction> _:b2

This statement is not inserted into the database, but is rather serving as a flag telling GraphDB that the statements from this transaction are going to be inserted as read-only; all statements derived from them are also marked as read-only. When you delete statements in a system transaction, you can remove statements marked as read-only, as well as statements derived from them. Axiom statements and all statements derived from them stay untouched.