Updating Data¶
What’s in this document?
Overview¶
Updating the content of RDF documents can generally be tricky due to the nature of RDF – no fixed schema or standard notion for management of multi-document graphs. There are two widely employed strategies when it comes to managing RDF documents – storing each RDF document in a single named graph vs. storing each RDF document as a collection of triples where multiple RDF documents exist in the same graph.
The single RDF document per named graph is easy to update - you can simply replace the content of the named graph with the updated document, and GraphDB provides an optimization to do that efficiently. However, when there are multiple documents in a graph and a single document needs to be updated, the old content of the document must be removed first. This is typically done using a handcrafted SPARQL update that deletes only the triples that define the document. This update needs to be the same on every client that updates data in order to get consistent behavior across the system.
GraphDB solves this by enabling smart updates using server-side SPARQL templates. Each template corresponds to a single document type, and defines the SPARQL update that needs to be executed in order to remove the previous content of the document.
To initiate a smart update, the user provides the IRI identifying the template (i.e., the document type) and the IRI identifying the document. The new content of the document is then simply added to the database in any of the supported ways – replace graph, SPARQL INSERT, add statements, etc.
Replace graph¶
A document (the smallest update unit) is defined as the contents of a named graph. Thus, to perform an update, you need to provide the following information:
The IRI of the named graph – the document ID
The new RDF contents of the named graph – the document contents
DELETE/INSERT template¶
A document is defined as all triples for a given document identifier according to a predefined schema. The schema is described as a SPARQL DELETE/INSERT template that can be filled from the provided data at update time. The following must be present at update time:
The SPARQL template update (must be predefined, not provided at update time)
Can be a DELETE WHERE update that only deletes the previous version of the document and the new data is inserted as is.
Can be a DELETE INSERT WHERE update that deletes the previous version of the document and adds additional triples, e.g. timestamp information.
The IRI of the updated document
The new RDF contents of the updated document
Transport mechanisms¶
The transport mechanism defines how users send RDF update data to GraphDB. Two mechanisms are supported - direct access and indirect access via the Kafka Sink connector.
Direct access¶
Direct access is a direct connection to GraphDB using the RDF4J API as well as any GraphDB extensions to that API, e.g. using SPARQL, deleting/adding individual triples, etc.
Replace graph¶
When a replace graph smart update is sent directly to GraphDB, the user does not need to do anything special, e.g. a simple CLEAR GRAPH followed by INSERT in the same graph.
DELETE/INSERT template¶
Unlike replace graph, this update mechanism needs a predefined SPARQL template that can be referenced at update time. Once a template has been defined, the user can request its use by inserting a system triple.
Let’s see how such a template can be used.
Create a repository.
In the SPARQL editor, add the following data about two employees in a factory and their salaries:
PREFIX factory: <http://factory/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> INSERT DATA { factory:1 rdf:type factory:Factory . factory:John <http://factory/hasSalary> 10000 ; <http://factory/worksIn> factory:1 . factory:Luke <http://factory/hasSalary> 10000 ; <http://factory/worksIn> factory:1 . }
If we run a simple SELECT query to get all information about John:
SELECT * WHERE { <http://factory/John> ?p ?o . }
We will get the following result:
Again in the SPARQL editor, create and execute the following template:
INSERT DATA { <http://example.com/my-template> <http://www.ontotext.com/sparql/template> ''' PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX factory: <http://factory/> DELETE { ?worker factory:hasSalary ?oldSalary . } INSERT { ?id factory:updatedOn ?now } WHERE { ?id rdf:type factory:Factory . ?worker factory:worksIn ?id . ?worker factory:hasSalary ?oldSalary . BIND(now() as ?now) } ''' }
Next, we execute a smart update to the RDF data, changing the employees’ salaries:
PREFIX onto: <http://www.ontotext.com/> PREFIX factory: <http://factory/> insert data { onto:smart-update onto:sparql-template <http://example.com/my-template> ; onto:template-binding-id factory:1 . factory:John factory:hasSalary 20000 . factory:Luke factory:hasSalary 20000 . }
Now let’s see how the data has changed. Run again:
SELECT * WHERE { <http://factory/John> ?p ?o . }
We can see that John’s salary has increased.
Indirect access via Kafka Sink connector¶
In this mode, the user pushes update messages to Kafka and the Kafka Sink Connector the updates. Users and consumers must agree on the following:
A given Kafka topic is configured to accept RDF updates in a predefined update type and format.
The types of updates that can be performed are: replace graph, DELETE/INSERT template, or simple add.
The format of the data must be one of the RDF formats.
For more details, see Kafka Sink connector.
Updates are performed as follows:
Replace graph¶
The Kafka topic is configured for replace graph.
The Kafka key defines the named graph to update.
The Kafka value defines the contents of the named graph.
DELETE/INSERT template¶
The Kafka topic is configured for a specific template.
The Kafka key defines the document IRI.
The Kafka value defines the new contents of the document.
Simple add¶
The Kafka topic is configured to only add data.
The Kafka key is irrelevant but it is recommended to use a unique ID, e.g. a random UUID.
The Kafka value is the new RDF data to be added.
SPARQL templates¶
The built-in SPARQL template plugin enables you to create predefined SPARQL templates that can be used for smart updates to the repository data. All of these operations will behave exactly like any other RDF data.
The plugin is defined with the special predicate <http://www.ontotext.com/sparql/template>
.
You can create and execute SPARQL templates in the Workbench from both the SPARQL editor and from the SPARQL Templates editor.
From the SPARQL editor¶
Create template¶
We will use the template from the above example example. Execute:
INSERT DATA {
<http://example.com/my-template> <http://www.ontotext.com/sparql/template> '''
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX factory: <http://factory/>
DELETE {
?worker factory:hasSalary ?oldSalary .
} INSERT {
?id factory:updatedOn ?now
} WHERE {
?id rdf:type factory:Factory .
?worker factory:worksIn ?id .
?worker factory:hasSalary ?oldSalary .
bind(now() as ?now)
}
'''
}
Get template content¶
SELECT ?template {
<http://example.com/my-template> <http://www.ontotext.com/sparql/template> ?template
}
This will return the content of the template, in our case
"
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX factory: <http://factory/>
DELETE {
?worker factory:hasSalary ?oldSalary .
} INSERT {
?id factory:updatedOn ?now
} WHERE {
?id rdf:type factory:Factory .
?worker factory:worksIn ?id .
?worker factory:hasSalary ?oldSalary .
bind(now() as ?now)
}
"
List defined templates¶
SELECT ?id ?template {
?id <http://www.ontotext.com/sparql/template> ?template
}
This will list the IDs of the available templates, in our case http://example.com/my-template
, and their content.
Update template¶
We can also update the content of the template with the same update operation from earlier:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
INSERT DATA {
<http://example.com/my-template> <http://www.ontotext.com/sparql/template> '''
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX factory: <http://factory/>
DELETE {
?worker factory:hasSalary ?oldSalary .
} INSERT {
?id factory:updatedOn ?now
} WHERE {
?id rdf:type factory:Factory .
?worker factory:worksIn ?id .
?worker factory:hasSalary ?oldSalary .
bind(now() as ?now)
}
'''
}
Delete template¶
DELETE WHERE {
<http://example.com/my-template> <http://www.ontotext.com/sparql/template> ?template
}
From the SPARQL Templates editor¶
For ease of use, the GraphDB Workbench also offers a separate menu tab where you can define your templates.
Go to
. A default example template will open.The template ID is required and must be an IRI. We will use the example from earlier:
http://example.com/my-template
.If you enter an invalid IRI, the SPARQL template editor will warn you of it.
The template body contains a default template. Replace it with:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX factory: <http://factory/> DELETE { ?worker factory:hasSalary ?oldSalary . } INSERT { ?id factory:updatedOn ?now } WHERE { ?id rdf:type factory:Factory . ?worker factory:worksIn ?id . ?worker factory:hasSalary ?oldSalary . bind(now() as ?now) }
This template can be used for smart updates to the RDF data as shown above.
Save the template. It will now be visible in the list with created templates where you can also edit or delete it.
SPARQL Template endpoint¶
In some cases, you may want to execute arbitrary SPARQL updates, storing not the variables but rather the relationship between those variables and the database. An easy way to do that is through the Workbench REST API SPARQL template endpoint. Let’s see how this is done.
First, we need to import some data with which we will be working.
Go to
and import the followingsample data
describing five fictitious wines:@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix wine: <http://www.ontotext.com/example/wine#> . wine:RedWine rdfs:subClassOf wine:Wine . wine:WhiteWine rdfs:subClassOf wine:Wine . wine:RoseWine rdfs:subClassOf wine:Wine . wine:Merlo rdf:type wine:Grape ; rdfs:label "Merlo" . wine:CabernetSauvignon rdf:type wine:Grape ; rdfs:label "Cabernet Sauvignon" . wine:CabernetFranc rdf:type wine:Grape ; rdfs:label "Cabernet Franc" . wine:PinotNoir rdf:type wine:Grape ; rdfs:label "Pinot Noir" . wine:Chardonnay rdf:type wine:Grape ; rdfs:label "Chardonnay" . wine:Yoyowine rdf:type wine:RedWine ; wine:madeFromGrape wine:CabernetSauvignon ; wine:hasSugar "dry" ; wine:hasYear "2013"^^xsd:integer . wine:Franvino rdf:type wine:RedWine ; wine:madeFromGrape wine:Merlo ; wine:madeFromGrape wine:CabernetFranc ; wine:hasSugar "dry" ; wine:hasYear "2012"^^xsd:integer . wine:Noirette rdf:type wine:RedWine ; wine:madeFromGrape wine:PinotNoir ; wine:hasSugar "medium" ; wine:hasYear "2012"^^xsd:integer . wine:Blanquito rdf:type wine:WhiteWine ; wine:madeFromGrape wine:Chardonnay ; wine:hasSugar "dry" ; wine:hasYear "2012"^^xsd:integer . wine:Rozova rdf:type wine:RoseWine ; wine:madeFromGrape wine:PinotNoir ; wine:hasSugar "medium" ; wine:hasYear "2013"^^xsd:integer .
After that, let’s create the SPARQL template.
Go to
and create the following template:PREFIX wine: <http://www.ontotext.com/example/wine#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> DELETE { ?s wine:hasSugar ?oldValue . ?s wine:hasYear ?oldYear } INSERT { ?s wine:hasSugar ?sugar . ?s wine:hasYear ?year . } WHERE { ?s ?p ?oldValue . ?s ?p1 ?oldYear . }
Let’s run a SPARQL query against the data. In the SPARQL editor, execute:
PREFIX wine: <http://www.ontotext.com/example/wine#> SELECT ?s ?p ?o WHERE { BIND(wine:Blanquito as ?s ) . ?s ?p ?o . }
The following results will be returned:
Example 1:
To change the values of the variables for sugar content and year, we will update the data through the REST API endpoint.
Go to
.For the X-GraphDB-Repository parameter, enter the name of your repository, e.g. “my_repo”.
In the document field, enter the JSON document:
{ "sugar" : "none" , "year" : 2020 , "s" : "http://www.ontotext.com/example/wine#Blanquito" }
Click Try it out.
To see how the data have been updated, let’s execute the SPARQL query from step 3 again:
We can see that the objects for the sugar content and year predicates have been updated to “none” and “2020”, respectively.
Here, we executed a template and added specific values for its variables. Even if we had not specified the type for
2020
, we would get a typed result:"2020"^^xsd:int
. This is because standard IRIs, numbers, and boolean values are recognized and parsed this way.
Example 2:
We can also create typed values explicitly by using JSON-LD-like values.
We will be using the same SPARQL template as in example 1.
Again in
, send:{ "sugar" : { "@id" : "custom:iri" } , "year" : { "@value" : "2020" , "@type" : "http://test.type" } , "s" : "http://www.ontotext.com/example/wine#Blanquito" }
Most IRIs will be recognized, but some custom ones will not. Here, we are using a special label
@id
so that the value forsugar
can be parsed as an IRI, since the valuecustom:iri
will not be considered an IRI by default.To see how the data have been updated, execute the query from example 1 in the SPARQL editor. The returned results will be:
As shown in the first example, the values will get a type if recognized. If we have a value not in its default type, we can use JSON-LD-like values containing both the
@value
and the@type
. Here, this is demonstrated with theyear
variable - the result is"2020"^^<http://test.type>
.