Provenance¶
What’s in this document?
The provenance plugin enables the generation of inference closure from a specific named graph at query time. This is useful in situations when you want to trace what the implicit statements generated from a specific graph are and the axiomatic triples part of the configured ruleset, i.e., the ones inserted with a special predicate sys:schemaTransaction
. For more information, check Reasoning.
By default, GraphDB’s forward-chaining inferencer materializes all implicit statements in the default graph. Therefore, it is impossible to trace which graphs these implicit statements are coming from. The provenance plugin provides the opposite approach. With the configured ruleset, the reasoner does forward-chaining over a specific named graph and generates all its implicit statements at query time.
Predicates¶
The plugin predicates gives you an easy access to the graph, which implicit statements you want to generate. The process is similar to the RDF reification. All plugin’s predicates start with <http://www.ontotext.com/provenance/>
:
Plugin predicates |
Semantics |
---|---|
|
Creates a request scope for the graph with the inference closure |
|
Binds all subjects part of the inference closure |
|
Binds all predicates part of the inference closure |
|
Binds all objects part of the inference closure |
Enabling the plugin¶
The plugin is disabled by default.
Start the plugin by adding the parameter:
./graphdb -Dregister-plugins=com.ontotext.trree.plugin.provenance.ProvenancePlugin
Check the startup log to validate that the plugin has started correctly.
[INFO ] 2016-11-18 19:47:19,134 [http-nio-7200-exec-2 c.o.t.s.i.PluginManager] Initializing plugin 'provenance'
Usage and examples¶
In the Workbench SPARQL editor, add the following data as schema transaction:
PREFIX ex: <http://example.com/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> INSERT data { [] <http://www.ontotext.com/owlim/system#schemaTransaction> [] . ex:BusStop a rdfs:Class . ex:SkiResort a rdfs:Class . ex:WebCam a rdfs:Class . ex:OutdoorWebCam a rdfs:Class . ex:Place a rdfs:Class . ex:BusStop rdfs:subClassOf ex:Place . ex:SkiResort rdfs:subClassOf ex:Place . ex:WebCam rdfs:subClassOf ex:Place . ex:OutdoorWebCam rdfs:subClassOf ex:WebCam . }
Add the following data as normal transaction:
PREFIX ex: <http://example.com/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> INSERT data { GRAPH ex:g1{ ex:webcam_g1 a ex:OutdoorWebCam . ex:webcam_g1 ex:containedIn ex:skiresort . } GRAPH ex:g1a{ ex:webcam_g1a a ex:OutdoorWebCam . ex:webcam_g1a ex:containedIn ex:skiresort . } GRAPH ex:g2{ ex:skiresort a ex:SkiResort . ex:skiresort ex:publicTransport ex:busstop . } GRAPH ex:g3{ ex:busstop a ex:BusStop . ex:busstop ex:nearBy ex:skiresort . } }
If we run the following query not using the plugin, it will return solutions over all statements that were inferred during the loading of the data:
PREFIX ex: <http://example.com/> SELECT * { ?webCam a ex:WebCam . ?webCam ex:containedIn ?skiresort . ?busstop ex:nearBy ?skiresort . }
The result will have two solutions.
This query showcases the newly introduced Provenance plugin predicate
pr:derive
. The computed closure is accessible in a dedicated special context whose name is provided as a subject of that predicate pattern. The patterns in the scope of that context are evaluated over the computed closure by the provenance plugin with the selected data. This allows for flexibility of use, as more than one context can be selected to supply data for processing, e.g.,pr:derive (ex:g1 ex:g2)
.PREFIX pr: <http://www.ontotext.com/provenance/> PREFIX ex: <http://example.com/> select * { pr:derive1 pr:derive (ex:g1 ex:g2 ex:g3) . GRAPH pr:derive1 { ?webCam a ex:WebCam . ?webCam ex:containedIn ?skiresort . ?busstop ex:nearBy ?skiresort . } }
This will return one solution since only the data from
ex:g1
,ex:g2
, andex:g3
will be used, so that the solution dependent on the data withinex:g1a
will not be part of the result.
Note
During evaluation, the inferences and the data are kept in-memory, so the plugin should be used with relatively small sets of statements placed in contexts.