Updating Data

Overview

Updating the content of RDF documents can generally be tricky due to the nature of RDF – no fixed schema or standard notion for management of multi-document graphs. There are two widely employed strategies when it comes to managing RDF documents – storing each RDF document in a single named graph vs. storing each RDF document as a collection of triples where multiple RDF documents exist in the same graph.

The single RDF document per named graph is easy to update - you can simply replace the content of the named graph with the updated document, and GraphDB provides an optimization to do that efficiently. However, when there are multiple documents in a graph and a single document needs to be updated, the old content of the document must be removed first. This is typically done using a handcrafted SPARQL update that deletes only the triples that define the document. This update needs to be the same on every client that updates data in order to get consistent behavior across the system.

GraphDB solves this by enabling smart updates using server-side SPARQL templates. Each template corresponds to a single document type, and defines the SPARQL update that needs to be executed in order to remove the previous content of the document.

To initiate a smart update, the user provides the IRI identifying the template (i.e., the document type) and the IRI identifying the document. The new content of the document is then simply added to the database in any of the supported ways – replace graph, SPARQL INSERT, add statements, etc.

Replace graph

A document (the smallest update unit) is defined as the contents of a named graph. Thus, to perform an update, you need to provide the following information:

  • The IRI of the named graph – the document ID

  • The new RDF contents of the named graph – the document contents

DELETE/INSERT template

A document is defined as all triples for a given document identifier according to a predefined schema. The schema is described as a SPARQL DELETE/INSERT template that can be filled from the provided data at update time. The following must be present at update time:

  • The SPARQL template update (must be predefined, not provided at update time)

    • Can be a DELETE WHERE update that only deletes the previous version of the document and the new data is inserted as is.

    • Can be a DELETE INSERT WHERE update that deletes the previous version of the document and adds additional triples, e.g. timestamp information.

  • The IRI of the updated document

  • The new RDF contents of the updated document

Transport mechanisms

The transport mechanism defines how users send RDF update data to GraphDB. Two mechanisms are supported - direct access and indirect access via the Kafka Sink connector.

Direct access

Direct access is a direct connection to GraphDB using the RDF4J API as well as any GraphDB extensions to that API, e.g. using SPARQL, deleting/adding individual triples, etc.

Replace graph

When a replace graph smart update is sent directly to GraphDB, the user does not need to do anything special, e.g. a simple CLEAR GRAPH followed by INSERT in the same graph.

DELETE/INSERT template

Unlike replace graph, this update mechanism needs a predefined SPARQL template that can be referenced at update time. Once a template has been defined, the user can request its use by inserting a system triple.

Let’s see how such a template can be used.

  1. Create a repository.

  2. In the SPARQL editor, add the following data about two employees in a factory and their salaries:

    PREFIX factory: <http://factory/>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    INSERT DATA {
        factory:1 rdf:type factory:Factory .
        factory:John <http://factory/hasSalary> 10000 ;
                      <http://factory/worksIn> factory:1 .
        factory:Luke <http://factory/hasSalary> 10000 ;
                     <http://factory/worksIn> factory:1 .
    }
    
  3. If we run a simple SELECT query to get all information about John:

    SELECT * WHERE {
        <http://factory/John> ?p ?o .
    }
    

    We will get the following result:

    _images/smart-updates-result1.png
  4. Again in the SPARQL editor, create and execute the following template:

    INSERT DATA {
    <http://example.com/my-template> <http://www.ontotext.com/sparql/template> '''
        PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
        PREFIX factory: <http://factory/>
        DELETE {
            ?worker factory:hasSalary ?oldSalary .
        } INSERT {
            ?id factory:updatedOn ?now
        } WHERE {
            ?id rdf:type factory:Factory .
            ?worker factory:worksIn ?id .
            ?worker factory:hasSalary ?oldSalary .
            BIND(now() as ?now)
        }
        '''
    }
    
  5. Next, we execute a smart update to the RDF data, changing the employees’ salaries:

    PREFIX onto: <http://www.ontotext.com/>
    PREFIX factory: <http://factory/>
    insert data {
        onto:smart-update onto:sparql-template <http://example.com/my-template> ;
                          onto:template-binding-id factory:1 .
        factory:John factory:hasSalary 20000 .
        factory:Luke factory:hasSalary 20000 .
    }
    
  6. Now let’s see how the data has changed. Run again:

    SELECT * WHERE {
        <http://factory/John> ?p ?o .
    }
    

    We can see that John’s salary has increased.

    _images/smart-updates-result2.png

Indirect access via Kafka Sink connector

In this mode, the user pushes update messages to Kafka and the Kafka Sink Connector the updates. Users and consumers must agree on the following:

  • A given Kafka topic is configured to accept RDF updates in a predefined update type and format.

  • The types of updates that can be performed are: replace graph, DELETE/INSERT template, or simple add.

  • The format of the data must be one of the RDF formats.

For more details, see Kafka Sink connector.

Updates are performed as follows:

Replace graph

  • The Kafka topic is configured for replace graph.

  • The Kafka key defines the named graph to update.

  • The Kafka value defines the contents of the named graph.

DELETE/INSERT template

  • The Kafka topic is configured for a specific template.

  • The Kafka key defines the document IRI.

  • The Kafka value defines the new contents of the document.

Simple add

  • The Kafka topic is configured to only add data.

  • The Kafka key is irrelevant but it is recommended to use a unique ID, e.g. a random UUID.

  • The Kafka value is the new RDF data to be added.

SPARQL templates

The built-in SPARQL template plugin enables you to create predefined SPARQL templates that can be used for smart updates to the repository data. All of these operations will behave exactly like any other RDF data.

The plugin is defined with the special predicate <http://www.ontotext.com/sparql/template>.

You can create and execute SPARQL templates in the Workbench from both the SPARQL editor and from the SPARQL Templates editor.

From the SPARQL editor

Create template

We will use the template from the above example example. Execute:

INSERT DATA {
    <http://example.com/my-template> <http://www.ontotext.com/sparql/template> '''
        PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
        PREFIX factory: <http://factory/>
        DELETE {
            ?worker factory:hasSalary ?oldSalary .
        } INSERT {
            ?id factory:updatedOn ?now
        } WHERE {
            ?id rdf:type factory:Factory .
            ?worker factory:worksIn ?id .
            ?worker factory:hasSalary ?oldSalary .
            bind(now() as ?now)
        }
        '''
    }

Get template content

SELECT ?template {
    <http://example.com/my-template> <http://www.ontotext.com/sparql/template> ?template
}

This will return the content of the template, in our case

"
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX factory: <http://factory/>
    DELETE {
        ?worker factory:hasSalary ?oldSalary .
    } INSERT {
        ?id factory:updatedOn ?now
    } WHERE {
        ?id rdf:type factory:Factory .
        ?worker factory:worksIn ?id .
        ?worker factory:hasSalary ?oldSalary .
        bind(now() as ?now)
    }
"

List defined templates

SELECT ?id ?template {
    ?id <http://www.ontotext.com/sparql/template> ?template
}

This will list the IDs of the available templates, in our case http://example.com/my-template, and their content.

Update template

We can also update the content of the template with the same update operation from earlier:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
INSERT DATA {
    <http://example.com/my-template> <http://www.ontotext.com/sparql/template> '''
        PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
        PREFIX factory: <http://factory/>
        DELETE {
            ?worker factory:hasSalary ?oldSalary .
        } INSERT {
            ?id factory:updatedOn ?now
        } WHERE {
            ?id rdf:type factory:Factory .
            ?worker factory:worksIn ?id .
            ?worker factory:hasSalary ?oldSalary .
            bind(now() as ?now)
        }
    '''
}

Delete template

DELETE WHERE {
    <http://example.com/my-template> <http://www.ontotext.com/sparql/template> ?template
}

From the SPARQL Templates editor

For ease of use, the GraphDB Workbench also offers a separate menu tab where you can define your templates.

  1. Go to Setup ‣ SPARQL Templates ‣ Create new SPARQL template. A default example template will open.

  2. The template ID is required and must be an IRI. We will use the example from earlier: http://example.com/my-template.

    If you enter an invalid IRI, the SPARQL template editor will warn you of it.

  3. The template body contains a default template. Replace it with:

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX factory: <http://factory/>
    DELETE {
        ?worker factory:hasSalary ?oldSalary .
    } INSERT {
        ?id factory:updatedOn ?now
    } WHERE {
        ?id rdf:type factory:Factory .
        ?worker factory:worksIn ?id .
        ?worker factory:hasSalary ?oldSalary .
        bind(now() as ?now)
    }
    

    This template can be used for smart updates to the RDF data as shown above.

  4. Save the template. It will now be visible in the list with created templates where you can also edit or delete it.

    _images/kafka-sink-update-template.png

SPARQL Template endpoint

In some cases, you may want to execute arbitrary SPARQL updates, storing not the variables but rather the relationship between those variables and the database. An easy way to do that is through the GraphDB REST API SPARQL template endpoint. Let’s see how this is done.

  1. First, we need to import some data with which we will be working.

    Go to Import ‣ User data ‣ Import RDF text snippet and import the following sample data describing five fictitious wines:

    @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
    @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
    @prefix wine: <http://www.ontotext.com/example/wine#> .
    
    wine:RedWine rdfs:subClassOf wine:Wine .
    wine:WhiteWine rdfs:subClassOf wine:Wine .
    wine:RoseWine rdfs:subClassOf wine:Wine .
    
    wine:Merlo
        rdf:type wine:Grape ;
        rdfs:label "Merlo" .
    
    wine:CabernetSauvignon
        rdf:type wine:Grape ;
        rdfs:label "Cabernet Sauvignon" .
    
    wine:CabernetFranc
        rdf:type wine:Grape ;
        rdfs:label "Cabernet Franc" .
    
    wine:PinotNoir
        rdf:type wine:Grape ;
        rdfs:label "Pinot Noir" .
    
    wine:Chardonnay
        rdf:type wine:Grape ;
        rdfs:label "Chardonnay" .
    
    wine:Yoyowine
        rdf:type wine:RedWine ;
        wine:madeFromGrape wine:CabernetSauvignon ;
        wine:hasSugar "dry" ;
        wine:hasYear "2013"^^xsd:integer .
    
    wine:Franvino
        rdf:type wine:RedWine ;
        wine:madeFromGrape wine:Merlo ;
        wine:madeFromGrape wine:CabernetFranc ;
        wine:hasSugar "dry" ;
        wine:hasYear "2012"^^xsd:integer .
    
    wine:Noirette
        rdf:type wine:RedWine ;
        wine:madeFromGrape wine:PinotNoir ;
        wine:hasSugar "medium" ;
        wine:hasYear "2012"^^xsd:integer .
    
    wine:Blanquito
        rdf:type wine:WhiteWine ;
        wine:madeFromGrape wine:Chardonnay ;
        wine:hasSugar "dry" ;
        wine:hasYear "2012"^^xsd:integer .
    
    wine:Rozova
        rdf:type wine:RoseWine ;
        wine:madeFromGrape wine:PinotNoir ;
        wine:hasSugar "medium" ;
        wine:hasYear "2013"^^xsd:integer .
    
  2. After that, let’s create the SPARQL template.

    Go to Setup ‣ SPARQL Templates ‣ Create new SPARQL template and create the following template:

    PREFIX wine: <http://www.ontotext.com/example/wine#>
    PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#>
    
    DELETE {
        ?s wine:hasSugar ?oldValue .
        ?s wine:hasYear ?oldYear
    } INSERT {
        ?s wine:hasSugar ?sugar .
        ?s wine:hasYear ?year .
    } WHERE {
        ?s ?p ?oldValue .
        ?s ?p1 ?oldYear .
    }
    
  3. Let’s run a SPARQL query against the data. In the SPARQL editor, execute:

    PREFIX wine: <http://www.ontotext.com/example/wine#>
    
    SELECT ?s ?p ?o
    WHERE {
       BIND(wine:Blanquito as ?s ) .
        ?s ?p ?o .
    }
    

    The following results will be returned:

    _images/sparql-template-endpoint-results1.png
  4. Example 1:

    To change the values of the variables for sugar content and year, we will update the data through the REST API endpoint.

    1. Go to Help ‣ REST API ‣ GraphDB Workbench API ‣ SPARQL Template Controller ‣ POST /rest/repositories/{repositoryID}/sparql-templates/execute.

    2. For the repositoryID parameter, enter the name of your repository, e.g. “my_repo”.

    3. In the document field, enter the JSON document:

      {
          "sugar" : "none" ,
          "year" : 2020 ,
          "s" : "http://www.ontotext.com/example/wine#Blanquito"
      }
      
    4. Click Try it out.

      _images/sparql-template-endpoint-update1.png
    5. To see how the data have been updated, let’s execute the SPARQL query from step 3 again:

      _images/sparql-template-endpoint-results2.png

      We can see that the objects for the sugar content and year predicates have been updated to “none” and “2020”, respectively.

      Here, we executed a template and added specific values for its variables. Even if we had not specified the type for 2020, we would get a typed result: "2020"^^xsd:int. This is because standard IRIs, numbers, and boolean values are recognized and parsed this way.

  5. Example 2:

    We can also create typed values explicitly by using JSON-LD-like values.

    1. We will be using the same SPARQL template as in example 1.

    2. Again in Help ‣ REST API ‣ GraphDB Workbench API ‣ SPARQL Template Controller ‣ POST /rest/repositories/{repositoryID}/sparql-templates/execute, send:

      {
          "sugar" : { "@id" : "custom:iri" } ,
      
          "year" : { "@value" : "2020" ,
                                  "@type" : "http://test.type" } ,
      
          "s" : "http://www.ontotext.com/example/wine#Blanquito"
      }
      

      Most IRIs will be recognized, but some custom ones will not. Here, we are using a special label @id so that the value for sugar can be parsed as an IRI, since the value custom:iri will not be considered an IRI by default.

    3. To see how the data have been updated, execute the query from example 1 in the SPARQL editor. The returned results will be:

      _images/sparql-template-endpoint-results3.png

      As shown in the first example, the values will get a type if recognized. If we have a value not in its default type, we can use JSON-LD-like values containing both the @value and the @type. Here, this is demonstrated with the year variable - the result is "2020"^^<http://test.type>.