Data history and versioning

What the plugin does

The Data history and versioning plugin enables you to access past states of your database through versioning of the RDF data model level. Collecting and querying the history of a database is beneficial for users and organizations that want to preserve all of their historical data, and are often faced with the common use case: I want to know when a value in the database has changed, and what the previous system state in time was.

The plugin remembers changes from multiple transactions and provides the means to track historical data. Changes in the repository are tracked globally for all users and all updates can be queried and processed at once. The tracked data is persisted to disk and is available after a restart.

It can be useful in several main types of cases, such as:

  • Generating a “diff” between generations while data updates are loaded into the system on a regular basis, either through ETL or a change data stream;

  • Answering the question of what has changed between moment A and moment B, for example: “After an application change was implemented over the weekend, I need to compare the deployment footprint or configuration of the before/after situation”;

  • Maintaining history only for specific classes or properties, i.e., no need for keeping history for everything. This is a significant advantage when working with very large databases, the querying of which would require substantial amounts of time and system resources;

  • Searching for the members of a specific team at point X.

Warning

Note that querying the history log may be slow for big history logs. This is why we recommend using filters to reduce the number of history entries if you have a big repository.

Index components

The plugin index is of the type DSPOCI, meaning that it consists of the following components:

  • Date-time - a 64-bit long value that represents the exact time an operation occurred with millisecond precision. All operations in the same transaction have the same date-time value.

  • Subject - the statement subject, 32 or 40 bit long.

  • Predicate - the statement predicate, 32 or 40 bit long.

  • Object - the statement object, 32 or 40 bit long.

  • Context - the statement context, 32 or 40 bit long. Special values are used for explicit statements in the default graph and for implicit statements. By including the implicit statements, we get transparent support for transactions.

  • Insert - a boolean value stored with as minimum bits as it makes sense. True represents an INSERT, and false represents a DELETE.

The index is ordered by each component going from left to right, where the date-time component is ordered in descending order (most recent updates come first), and all other components are ordered in ascending order. For example:

Date-time

Subject

Predicate

Object

Context

Insert

1570623056397

34

1

29

-3

TRUE

1570623056397

34

1

38

-2

TRUE

1570623042812

34

1

30

-2

FALSE

1570623042812

34

2

31

-2

FALSE

Tip

Due to the order of the index components, the most time-efficient way to query your data is first by date-time and then by subject. This is particularly valid when using predicate parameters as described in the examples below.

Usage

Enable/disable plugin

Enabling and disabling the plugin refers to collecting history only, and is disabled by default. Querying the collected history is possible at any moment.

To enable the plugin, execute the following query:

INSERT DATA {
    [] <http://www.ontotext.com/at/enabled> true
}

To disable it, execute:

INSERT DATA {
    [] <http://www.ontotext.com/at/enabled> false
}

To check the current enabled status, execute:

SELECT ?enabled {
    [] <http://www.ontotext.com/at/enabled> ?enabled
}

Clear all data

If you want to clear all data in your repository, you should first disable collecting history, as there is no way to have usable history after this operation has been executed. For example:

  • You try to execute CLEAR ALL, but get an error: The reason is that clearing all statements in the repository is incompatible with collecting history. Disable collecting history if you really want to clear all data.

  • You disable collecting history and retry CLEAR ALL: All data in the repository is deleted. All history data is deleted as well, since whatever is there is no longer usable.

History filtering

As keeping history for everything is most of the time unnecessary, as well as quite time- and resource-consuming, this plugin provides the capability for specifying only certain classes or properties. When configuring the index, you need to specify 4 mandatory positions: subject, predicate, object, and context. Each position can have one of the following values:

  • * - everything is allowed

  • IRI, BNode or Literal - the type of the entity on this position must be the specified one, case insensitive

  • an IRI - only this IRI is allowed

  • an IRI prefix (http://myIRI*) - all IRIs that start with the given prefix are allowed

Filter examples

  • * * literal *: match statements that have any literal in the object position

  • * http://example.com/name * *: match statements whose predicate is http://example.com/name

  • http://example.com/person/* * * *: match statements whose subject is an IRI starting with http://example.com/person/

A statement is kept in the history if it matches at least one of the provided statement templates.

Manage filters

  • Add filter

    INSERT DATA {
        [] <http://www.ontotext.com/at/addFilters> "* * LITERAL *"
    }
    
  • Remove filter

    INSERT DATA {
        [] <http://www.ontotext.com/at/removeFilters> "* * LITERAL *"
    }
    
  • List filters

    SELECT ?filter WHERE {
        [] <http://www.ontotext.com/at/getFilters> ?filter
    }
    

Query process and examples

  1. Enable the plugin:

    INSERT DATA {
        [] <http://www.ontotext.com/at/enabled> true
    }
    
  2. Insert the data you want to query:

    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
        PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
        INSERT DATA {
        <urn:Human> rdfs:subClassOf <urn:Mammal> .
        <urn:Commander> rdfs:subClassOf <urn:StarfleetOfficer> .
        <urn:Captain> rdfs:subClassOf <urn:StarfleetOfficer> .
        <urn:Kirk> a <urn:Human> ;
            <urn:dateOfBirth> "2233-03-22"^^xsd:date ;
            <urn:name> "James T. Kirk" ;
            <urn:rank> <urn:Commander> .
        }
    

    Change the name of a particular Starfleet officer, so that you can then see how this change is tracked:

    delete data { <urn:Kirk> <urn:name> "James T. Kirk" };
    insert data { <urn:Kirk> <urn:name> "James Tiberius Kirk" }
    
  3. Query the history of your data:

    1. Find out the specific point in time when data was changed by browsing the history with the following query:

      PREFIX hist: <http://www.ontotext.com/at/>
      SELECT * {
          ?log a hist:history ;
           hist:timestamp ?time ;
           hist:graph ?g ;
           hist:subject ?s ;
           hist:predicate ?p ;
           hist:object ?o ;
           hist:insert ?i
      }
      

      The retrieved results are in descending order, i.e., the most recent change comes first:

      _images/history_specific_point_time.png
  1. You can also find out what changes were made for a subject and a predicate within a specific time period between moment A and moment B. This is done with the hist:parameters predicate used the following way: ?log hist:parameters (?fromDateTime ?toDateTime ?subject ?predicate ?object ?context).

    While the predicate is not mandatory, passing parameters when querying history is much more efficient than fetching all history elements and then filtering them. Note that their order is important, and when present, the predicate will only return history entries that match the list. Only bound variables will be taken, and there may also be unbound parameters. Not all bindings are required, but since the object list is an ordered list, if you want to filter by subject for example, you must add at least ?fromDateTime ?toDateTime ?subject as bindings. ?fromDateTime ?toDateTime may be left unbound.

    The following query returns all changes made within a given time period:

    PREFIX hist: <http://www.ontotext.com/at/>
    PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
    SELECT * {
        ?log a hist:history ;
         hist:parameters ("2020-01-17T14:38:50"^^xsd:dateTime "2020-01-17T15:00:00"^^xsd:dateTime);
         hist:timestamp ?time ;
         hist:graph ?g ;
         hist:subject ?s ;
         hist:predicate ?p ;
         hist:object ?o ;
         hist:insert ?i
    }
    
    _images/history_all_changes_for_time_period.png

    You can also find out all changes for a particular subject and predicate. Note that the ?fromDateTime ?toDateTime parameters are left unbound.

    PREFIX hist: <http://www.ontotext.com/at/>
    PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
    SELECT ?time ?s ?p ?o ?i {
        ?log a hist:history ;
         hist:parameters (?fromDateTime ?toDateTime <urn:Kirk> <urn:name> ?object ?context);
         hist:timestamp ?time ;
         hist:graph ?g ;
         hist:subject ?s ;
         hist:predicate ?p ;
         hist:object ?o ;
         hist:insert ?i
    }
    
    _images/history_all_changes_for_subject_predicate.png
  2. You can query the data at a specific point in time by including FROM <http://www.ontotext.com/at/xxx>, where xxx is a date-time in the format: yyyy[[[[[MM]dd]HH]mm]ss]. For example:

    # Return data as it looked on 2020-01-17 14:38:55 server time
    #
    SELECT ?name ?rank ?dateOfBirth FROM <http://www.ontotext.com/at/20200117143855> {
        bind(<urn:Kirk> as ?officer)
        ?officer <urn:name> ?name ;
                <urn:rank> ?rank ;
                <urn:dateOfBirth> ?dateOfBirth .
    }
    
    _images/history_entry_at_specific_time.png

    The same query will return a valid graph with only the date specified:

    # Return data as it looked on 2020-01-17 00:00:00 server time
    # (explicit year and month only)
    #
    SELECT ?name ?rank ?dateOfBirth FROM <http://www.ontotext.com/at/20200117> {
        bind(<urn:Kirk> as ?officer)
        ?officer <urn:name> ?name ;
                <urn:rank> ?rank ;
                <urn:dateOfBirth> ?dateOfBirth .
    }
    

    To retrieve all data for that particular Starfleet officer at a specific point in time, you can also use a DESCRIBE query:

    DESCRIBE <urn:Kirk> from <http://www.ontotext.com/at/20200117143855>
    

    The result from our example at that point in time would be:

    _images/history_describe.png

Note

Statements that have history will use the history data according to the requested point in time. Statements that do not have history will be returned directly, assuming they were never modified and existed at the requested point as well.