Data history and versioning

What the plugin does

The Data history and versioning plugin enables you to access past states of your database through versioning of the RDF data model level. Collecting and querying the history of a database is beneficial for users and organizations that want to preserve all of their historical data, and are often faced with the common use case: I want to know when a value in the database has changed, and what the previous system state in time was.

The plugin remembers changes from multiple transactions and provides the means to track historical data. Changes in the repository are tracked globally for all users and all updates can be queried and processed at once. The tracked data is persisted to disk and is available after a restart.

It can be useful in several main types of cases, such as:

  • Generating a “diff” between generations while data updates are loaded into the system on a regular basis, either through ETL or a change data stream;

  • Answering the question of what has changed between moment A and moment B, for example: “After an application change was implemented over the weekend, I need to compare the deployment footprint or configuration of the before/after situation”;

  • Maintaining history only for specific classes or properties, that is, no need for keeping history for everything. This is a significant advantage when working with very large databases, the querying of which would require substantial amounts of time and system resources;

  • Searching for the members of a specific team at point X.

Warning

Note that querying the history log may be slow for big history logs. This is why we recommend using filters to reduce the number of history entries if you have a big repository.

Usage

Enable/disable plugin

Enabling and disabling the plugin refers to collecting history only, and is disabled by default. Querying the collected history is possible at any moment.

To enable the plugin, execute the following query:

INSERT DATA {
    [] <http://www.ontotext.com/at/enabled> true
}

To disable it, execute:

INSERT DATA {
    [] <http://www.ontotext.com/at/enabled> false
}

To check the current enabled status, execute:

SELECT ?enabled {
    [] <http://www.ontotext.com/at/enabled> ?enabled
}

Transaction IDs

As shown in this query example, each transaction can be identified both by its timestamp as well as a transaction ID. The transaction ID is an IRI and by default is derived from the timestamp.

Setting your own transaction IDs

You can assign a persistent IRI of your choosing as the transaction ID for a given transaction, and it will be used as the subject of all triples that record this transaction in the data history. This makes the querying and other management of transaction data much easier.

The transaction ID must be a valid IRI, and it must be one that has not already been used as a transaction ID in the collected history.

The following example demonstrates the use of the hist:transactionId predicate to assign the URI http://example.com/t14 via a SPARQL update that replaces a triple about a resource’s given name value with a another triple that has a new value for that name:

PREFIX hist: <http://www.ontotext.com/at/>

INSERT DATA {
    [] hist:transactionId <http://example.com/t14> .
};

DELETE DATA {
    <urn:Kirk> <urn:givenName> "Jim" .
};

INSERT DATA {
    <urn:Kirk> <urn:givenName> "James" .
}

The statement that sets the transaction ID must be the first change in the transaction.

Clear all data

If you want to clear all data in your repository, you should first disable collecting history, as there is no way to have usable history after this operation has been executed. For example:

  • You try to execute CLEAR ALL, but get an error: The reason is that clearing all statements in the repository is incompatible with collecting history. Disable collecting history if you really want to clear all data.

  • You disable collecting history and retry CLEAR ALL: All data in the repository is deleted. All history data is deleted as well, since whatever is there is no longer usable.

Clear history

You can also delete only the history without deleting the data in the repository or having to disable collecting history. Execute:

PREFIX hist: <http://www.ontotext.com/at/>
INSERT DATA {
    [] hist:clearHistory [] .
}

Trim history

The history can also be trimmed in various ways:

Delete history before a certain date

PREFIX hist: <http://www.ontotext.com/at/>
INSERT DATA {
    [] hist:trimBefore "2022-11-29"  .
}

The provided literal must be a valid xsd:date or xsd:dateTime value. If only the date is specified, the time is assumed to be midnight (00:00:00). The timezone is by default the system timezone. For more precise trimming, a full datetime should be specified.

Trim history by size

Size here means the number of statements in the history log to be preserved.

PREFIX hist: <http://www.ontotext.com/at/>
INSERT DATA {
    [] hist:trimToSize 1000 .
}

Trim the history to a given period from the current date and time

PREFIX hist: <http://www.ontotext.com/at/>
INSERT DATA {
    [] hist:trimToPeriod "P3D" .
}

The provided literal must be a valid xsd:duration value. P3D here means 3 days — so only the history from the last 3 days would remain after executing the update. We can also specify minutes, hours, etc.

History filtering

As keeping history for everything is, most of the time, unnecessary, as well as quite time- and resource-consuming, this plugin provides the capability for specifying only certain classes or properties. When configuring the index, you need to specify 4 mandatory positions: subject, predicate, object, and context. Each position can have one of the following values:

  • *: Everything is allowed.

  • !(IRI, Bnode, or Literal): Anything apart from the selected type is allowed.

  • IRI, BNode or Literal: The type of the entity on this position must be the specified one, case insensitive.

  • an IRI: Only this IRI is allowed.

  • an IRI prefix (http://myIRI*): All IRIs that start with the given prefix are allowed.

Filter examples

  • * * literal *: Match statements that contain any literal in the object position.

  • * * !literal *: Match statements that do not contain any literal in the object position.

  • * http://example.com/name * *: Match statements whose predicate is http://example.com/name.

  • http://example.com/person/* * * *: Match statements whose subject is an IRI starting with http://example.com/person/.

A statement is kept in the history if it matches at least one of the provided statement templates.

Manage filters

  • Add filter

    INSERT DATA {
        [] <http://www.ontotext.com/at/addFilters> "* * LITERAL *"
    }
    
  • Remove filter

    INSERT DATA {
        [] <http://www.ontotext.com/at/removeFilters> "* * LITERAL *"
    }
    
  • List filters

    SELECT ?filter WHERE {
        [] <http://www.ontotext.com/at/getFilters> ?filter
    }
    

Query process and examples

  1. Enable the plugin:

    INSERT DATA {
        [] <http://www.ontotext.com/at/enabled> true
    }
    
  2. Insert the data you want to query:

    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
    INSERT DATA {
        <urn:Human> rdfs:subClassOf <urn:Mammal> .
        <urn:Commander> rdfs:subClassOf <urn:StarfleetOfficer> .
        <urn:Captain> rdfs:subClassOf <urn:StarfleetOfficer> .
        <urn:Kirk> a <urn:Human> ;
            <urn:dateOfBirth> "2233-03-22"^^xsd:date ;
            <urn:name> "James T. Kirk" ;
            <urn:rank> <urn:Commander> .
    }
    
  3. Change the name of a particular Starfleet officer, so that you can then see how this change is tracked:

    DELETE DATA {
        <urn:Kirk> <urn:name> "James T. Kirk"
    };
    INSERT DATA {
        <urn:Kirk> <urn:name> "James Tiberius Kirk"
    }
    
  4. Query the history of your data:

    1. Find out the specific point in time when data was changed by browsing the history with the following query:

      PREFIX hist: <http://www.ontotext.com/at/>
      SELECT * {
          ?transactionId a hist:history ;
              hist:timestamp ?time ;
              hist:graph ?g ;
              hist:subject ?s ;
              hist:predicate ?p ;
              hist:object ?o ;
              hist:insert ?i ;
              hist:username ?username .
      }
      

      The retrieved results are in descending order, that is, the most recent change comes first:

      _images/history_specific_point_time.png

      The subject of the ?transactionId a hist:historyId will be bound to the transaction ID of each transaction in the collected history.

      The username of the user who executed the transaction can be accessed via the hist:username predicate. In this example, the initial data was imported by “john.smith”, while the last change was executed by “mary.green”.

    2. Let’s see how we can use a negation filter.

      1. Run the following query to apply the filter shown above stating that no literal can be in the object position:

        INSERT DATA {
            [] <http://www.ontotext.com/at/addFilters> "* * !LITERAL *"
        }
        
      2. Now, let’s add a second date of birth for the Commander:

        PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
        INSERT DATA {
            <urn:Kirk> <urn:dateOfBirth> "2633-03-22"^^xsd:date .
        }
        
      3. If we go back to the query from 4.a and execute it, we will see that the data has not been added since it is a literal.

    3. You can also find out what changes were made for a subject and a predicate within a specific time period between two points in time. This is done with the hist:listWithFilter magic predicate:

      [] hist:listWithFilter (?fromDateTime ?toDateTime ?subject ?predicate ?object ?context).
      

      Note

      The hist:listWithFilter magic predicate replaces the hist:parameters one, which worked the same way but is deprecated as of release 10.4. Note that hist:listWithFilter is used directly and not together with ?transactionId a hist:history.

      As such, hist:listWithFilter will not bind the transaction ID in the subject position. If you need the transaction ID, you can access it through the hist:transactionId predicate as illustrated in the examples below.

      While the predicate is not mandatory, passing parameters when querying history is much more efficient than fetching all history elements and then filtering them. Note that their order is important, and when present, the predicate will only return history entries that match the list. Only bound variables will be taken, and there may also be unbound parameters. Not all bindings are required, but since the object list is an ordered list, if you want to filter by subject for example, you must add at least ?fromDateTime ?toDateTime ?subject as bindings. ?fromDateTime ?toDateTime may be left unbound.

      The following query returns all changes made within a given time period:

      PREFIX hist: <http://www.ontotext.com/at/>
      PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
      SELECT * {
          [] hist:listWithFilter ("2022-07-12T16:17:00"^^xsd:dateTime "2022-07-12T16:20:00"^^xsd:dateTime);
             hist:transactionId ?transactionId ;
             hist:timestamp ?time ;
             hist:graph ?g ;
             hist:subject ?s ;
             hist:predicate ?p ;
             hist:object ?o ;
             hist:insert ?i
      }
      

      You can also query for all changes for a particular subject and predicate. Note that the ?fromDateTime ?toDateTime parameters are left unbound.

      PREFIX hist: <http://www.ontotext.com/at/>
      PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
      SELECT ?time ?s ?p ?o ?i {
          [] hist:listWithFilter (?fromDateTime ?toDateTime <urn:Kirk> <urn:name> ?object ?context);
             hist:transactionId ?transactionId ;
             hist:timestamp ?time ;
             hist:graph ?g ;
             hist:subject ?s ;
             hist:predicate ?p ;
             hist:object ?o ;
             hist:insert ?i
      }
      
      _images/history_all_changes_for_subject_predicate.png
    4. You can query the data at a specific point in time by including FROM <http://www.ontotext.com/at/xxx>, where xxx is a date-time in the format: yyyy[[[[[MM]dd]HH]mm]ss]. For example:

      # Return data as it looked on 2022-07-12 16:17:17 server time
      #
      SELECT ?name ?rank ?dateOfBirth FROM <http://www.ontotext.com/at/20220712161717> {
          BIND(<urn:Kirk> as ?officer)
          ?officer <urn:name> ?name ;
                  <urn:rank> ?rank ;
                  <urn:dateOfBirth> ?dateOfBirth .
      }
      
      _images/history_entry_at_specific_time.png

      The same query will return a valid graph with only the date specified:

      # Return data as it looked on 2022-07-12 00:00:00 server time
      # (explicit year and month only)
      #
      SELECT ?name ?rank ?dateOfBirth FROM <http://www.ontotext.com/at/20220712> {
          BIND(<urn:Kirk> as ?officer)
          ?officer <urn:name> ?name ;
                  <urn:rank> ?rank ;
                  <urn:dateOfBirth> ?dateOfBirth .
      }
      

      To retrieve all data for that particular Starfleet officer at a specific point in time, you can also use a DESCRIBE query:

      DESCRIBE <urn:Kirk> FROM <http://www.ontotext.com/at/20220712161717>
      

      The result from our example at that point in time would be:

      _images/history_describe.png

    Note

    Statements that have history will use the history data according to the requested point in time. Statements that do not have history will be returned directly, assuming they were never modified and existed at the requested point as well.

Index components

The plugin index is of the type DSPOCI, meaning that it consists of the following components indexed in the order shown:

  • Date-time — a 64-bit long value that represents the exact time an operation occurred with millisecond precision. All operations in the same transaction have the same date-time value.

  • Subject — the statement subject, 32 or 40 bit long.

  • Predicate — the statement predicate, 32 or 40 bit long.

  • Object — the statement object, 32 or 40 bit long.

  • Context — the statement context, 32 or 40 bit long. Special values are used for explicit statements in the default graph and for implicit statements. By including the implicit statements, we get transparent support for transactions.

  • Insert — a boolean value stored with as minimum bits as it makes sense. True represents an INSERT, and false represents a DELETE.

The index is ordered by each component going from left to right, where the date-time component is ordered in descending order (most recent updates come first), and all other components are ordered in ascending order. For example:

Date-time

Subject

Predicate

Object

Context

Insert

1570623056397

urn:Kirk

urn:name

"James Tiberius. Kirk"

<http://www.ontotext.com/explicit>

False

1570623056397

urn:Kirk

urn:name

"James. T. Kirk"

<http://www.ontotext.com/explicit>

True

Tip

Due to the order of the index components, the most time-efficient way to query your data is first by date-time and then by subject. This is particularly valid when listing with a filter.