Data history and versioning¶
What’s in this document?
What the plugin does¶
The Data history and versioning plugin enables you to access past states of your database through versioning of the RDF data model level. Collecting and querying the history of a database is beneficial for users and organizations that want to preserve all of their historical data, and are often faced with the common use case: I want to know when a value in the database has changed, and what the previous system state in time was.
The plugin remembers changes from multiple transactions and provides the means to track historical data. Changes in the repository are tracked globally for all users and all updates can be queried and processed at once. The tracked data is persisted to disk and is available after a restart.
It can be useful in several main types of cases, such as:
Generating a “diff” between generations while data updates are loaded into the system on a regular basis, either through ETL or a change data stream;
Answering the question of what has changed between moment A and moment B, for example: “After an application change was implemented over the weekend, I need to compare the deployment footprint or configuration of the before/after situation”;
Maintaining history only for specific classes or properties, i.e., no need for keeping history for everything. This is a significant advantage when working with very large databases, the querying of which would require substantial amounts of time and system resources;
Searching for the members of a specific team at point X.
Warning
Note that querying the history log may be slow for big history logs. This is why we recommend using filters to reduce the number of history entries if you have a big repository.
Index components¶
The plugin index is of the type DSPOCI, meaning that it consists of the following components:
Date-time - a 64-bit long value that represents the exact time an operation occurred with millisecond precision. All operations in the same transaction have the same date-time value.
Subject - the statement subject, 32 or 40 bit long.
Predicate - the statement predicate, 32 or 40 bit long.
Object - the statement object, 32 or 40 bit long.
Context - the statement context, 32 or 40 bit long. Special values are used for explicit statements in the default graph and for implicit statements. By including the implicit statements, we get transparent support for transactions.
Insert - a boolean value stored with as minimum bits as it makes sense.
True
represents anINSERT
, andfalse
represents aDELETE
.
The index is ordered by each component going from left to right, where the date-time component is ordered in descending order (most recent updates come first), and all other components are ordered in ascending order. For example:
Date-time |
Subject |
Predicate |
Object |
Context |
Insert |
---|---|---|---|---|---|
1570623056397 |
34 |
1 |
29 |
-3 |
TRUE |
1570623056397 |
34 |
1 |
38 |
-2 |
TRUE |
1570623042812 |
34 |
1 |
30 |
-2 |
FALSE |
1570623042812 |
34 |
2 |
31 |
-2 |
FALSE |
Tip
Due to the order of the index components, the most time-efficient way to query your data is first by date-time and then by subject. This is particularly valid when using predicate parameters as described in the examples below.
Usage¶
Enable/disable plugin¶
Enabling and disabling the plugin refers to collecting history only, and is disabled by default. Querying the collected history is possible at any moment.
To enable the plugin, execute the following query:
INSERT DATA { [] <http://www.ontotext.com/at/enabled> true }
To disable it, execute:
INSERT DATA { [] <http://www.ontotext.com/at/enabled> false }
To check the current enabled status, execute:
SELECT ?enabled { [] <http://www.ontotext.com/at/enabled> ?enabled }
Clear all data¶
If you want to clear all data in your repository, you should first disable collecting history, as there is no way to have usable history after this operation has been executed. For example:
You try to execute
CLEAR ALL
, but get an error: The reason is that clearing all statements in the repository is incompatible with collecting history. Disable collecting history if you really want to clear all data.You disable collecting history and retry
CLEAR ALL
: All data in the repository is deleted. All history data is deleted as well, since whatever is there is no longer usable.
Clear history¶
You can also delete only the history without deleting the data in the repository or having to disable collecting history. Execute:
PREFIX hist: <http://www.ontotext.com/at/>
INSERT DATA {
[] hist:clearHistory [] .
}
Trim history¶
The history can also be trimmed in various ways:
Delete history before a certain date¶
PREFIX hist: <http://www.ontotext.com/at/>
INSERT DATA {
[] hist:trimBefore "2022-11-29" .
}
The provided literal must be interpretable as xsd:date
or xsd:dateTime
.
If only the date is specified, the time is assumed to be midnight (00:00:00). The timezone is by default the system timezone. For more precise trimming, a full datetime should be specified.
Trim history by size¶
Size here means the number of statements in the history log to be preserved.
PREFIX hist: <http://www.ontotext.com/at/>
INSERT DATA {
[] hist:trimToSize 1000 .
}
Trim the history to a given period from the current date and time¶
PREFIX hist: <http://www.ontotext.com/at/>
INSERT DATA {
[] hist:trimToPeriod "P3D" .
}
The provided literal must be interpretable as xsd:duration
.
P3D here means 3 days - so only the history from the last 3 days would remain after executing the update. We can also specify minutes, hours, etc.
History filtering¶
As keeping history for everything is most of the time unnecessary, as well as quite time- and resource-consuming, this plugin provides the capability for specifying only certain classes or properties. When configuring the index, you need to specify 4 mandatory positions: subject, predicate, object, and context. Each position can have one of the following values:
*
- everything is allowedIRI
,BNode
orLiteral
- the type of the entity on this position must be the specified one, case insensitivean IRI - only this IRI is allowed
an IRI prefix (
http://myIRI*
) - all IRIs that start with the given prefix are allowed
Filter examples¶
* * literal *
: match statements that have any literal in the object position* http://example.com/name * *
: match statements whose predicate ishttp://example.com/name
http://example.com/person/* * * *
: match statements whose subject is an IRI starting withhttp://example.com/person/
A statement is kept in the history if it matches at least one of the provided statement templates.
Manage filters¶
Add filter
INSERT DATA { [] <http://www.ontotext.com/at/addFilters> "* * LITERAL *" }
Remove filter
INSERT DATA { [] <http://www.ontotext.com/at/removeFilters> "* * LITERAL *" }
List filters
SELECT ?filter WHERE { [] <http://www.ontotext.com/at/getFilters> ?filter }
Query process and examples¶
Enable the plugin:
INSERT DATA { [] <http://www.ontotext.com/at/enabled> true }
Insert the data you want to query:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> INSERT DATA { <urn:Human> rdfs:subClassOf <urn:Mammal> . <urn:Commander> rdfs:subClassOf <urn:StarfleetOfficer> . <urn:Captain> rdfs:subClassOf <urn:StarfleetOfficer> . <urn:Kirk> a <urn:Human> ; <urn:dateOfBirth> "2233-03-22"^^xsd:date ; <urn:name> "James T. Kirk" ; <urn:rank> <urn:Commander> . }
Change the name of a particular Starfleet officer, so that you can then see how this change is tracked:
delete data { <urn:Kirk> <urn:name> "James T. Kirk" }; insert data { <urn:Kirk> <urn:name> "James Tiberius Kirk" }
Query the history of your data:
Find out the specific point in time when data was changed by browsing the history with the following query:
PREFIX hist: <http://www.ontotext.com/at/> SELECT * { ?log a hist:history ; hist:timestamp ?time ; hist:graph ?g ; hist:subject ?s ; hist:predicate ?p ; hist:object ?o ; hist:insert ?i }
The retrieved results are in descending order, i.e., the most recent change comes first:
You can also find out what changes were made for a subject and a predicate within a specific time period between moment A and moment B. This is done with the
hist:parameters
predicate used the following way:?log hist:parameters (?fromDateTime ?toDateTime ?subject ?predicate ?object ?context)
.While the predicate is not mandatory, passing parameters when querying history is much more efficient than fetching all history elements and then filtering them. Note that their order is important, and when present, the predicate will only return history entries that match the list. Only bound variables will be taken, and there may also be unbound parameters. Not all bindings are required, but since the object list is an ordered list, if you want to filter by subject for example, you must add at least
?fromDateTime ?toDateTime ?subject
as bindings.?fromDateTime ?toDateTime
may be left unbound.The following query returns all changes made within a given time period:
PREFIX hist: <http://www.ontotext.com/at/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT * { ?log a hist:history ; hist:parameters ("2020-01-17T14:38:50"^^xsd:dateTime "2020-01-17T15:00:00"^^xsd:dateTime); hist:timestamp ?time ; hist:graph ?g ; hist:subject ?s ; hist:predicate ?p ; hist:object ?o ; hist:insert ?i }![]()
You can also find out all changes for a particular subject and predicate. Note that the
?fromDateTime ?toDateTime
parameters are left unbound.PREFIX hist: <http://www.ontotext.com/at/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT ?time ?s ?p ?o ?i { ?log a hist:history ; hist:parameters (?fromDateTime ?toDateTime <urn:Kirk> <urn:name> ?object ?context); hist:timestamp ?time ; hist:graph ?g ; hist:subject ?s ; hist:predicate ?p ; hist:object ?o ; hist:insert ?i }![]()
You can query the data at a specific point in time by including
FROM <http://www.ontotext.com/at/xxx>
, wherexxx
is a date-time in the format: yyyy[[[[[MM]dd]HH]mm]ss]. For example:# Return data as it looked on 2020-01-17 14:38:55 server time # SELECT ?name ?rank ?dateOfBirth FROM <http://www.ontotext.com/at/20200117143855> { bind(<urn:Kirk> as ?officer) ?officer <urn:name> ?name ; <urn:rank> ?rank ; <urn:dateOfBirth> ?dateOfBirth . }![]()
The same query will return a valid graph with only the date specified:
# Return data as it looked on 2020-01-17 00:00:00 server time # (explicit year and month only) # SELECT ?name ?rank ?dateOfBirth FROM <http://www.ontotext.com/at/20200117> { bind(<urn:Kirk> as ?officer) ?officer <urn:name> ?name ; <urn:rank> ?rank ; <urn:dateOfBirth> ?dateOfBirth . }To retrieve all data for that particular Starfleet officer at a specific point in time, you can also use a DESCRIBE query:
DESCRIBE <urn:Kirk> from <http://www.ontotext.com/at/20200117143855>The result from our example at that point in time would be:
![]()
Note
Statements that have history will use the history data according to the requested point in time. Statements that do not have history will be returned directly, assuming they were never modified and existed at the requested point as well.