Graph replacement optimization

Clearing and old graph and then importing the new information can often be inefficient. Since the two operations are handled separately, it is impossible to determine if a statement will also be present in the new graph and therefore, keep it there. The same applies for preserving connectors or inferring statements. Therefore, GraphDB offers an optimized graph replacement algorithm, making graph updates faster in those situations where the new graph will partially overlap with data in the old one.

The graph replacement optimization is in effect when the replacement is done in a single transaction and when the transaction is bigger than a certain threshold. By default, this threshold is set to 1,000, but it can be controlled by using the graphdb.engine.min-replace-graph-tx-size configuration parameter.

The algorithm has the following steps:

  1. Check transaction contents. If the transaction includes a graph replacement and is of sufficient size, proceed.

  2. Check if any of the graphs to be replaced are valid and if any of them have data. If so, store their identifiers in a list.

  3. While processing transaction statements for insertion, if their context (graph) matches an identifier from the list, store them inside a tracker.

  4. While clearing the graph to be replaced, if it is not mentioned in the tracker, directly delete all its contents.

  5. If a graph is mentioned in the tracker, iterate over its triples.

  6. Triples in the replacement graph that are also in the tracker are preserved. Otherwise, they are deleted.

Deletions may trigger re-inference and are a more costly process than the check described in the algorithm. Therefore, in some test cases due to the optimization users can observe a speedup of up to 200%.

Here is an example of an update that will use the replacement optimization algorithm:

curl -X PUT -H "Content-Type: application/x-trig" --data-binary '@test_modified.trig'\
    'http://localhost:7200/repositories/test/rdf-graphs/service?graph=http://example.org/optimizations/replacement'

By contrast, the following approach will not use the optimization since it performs the replacement in two separate steps:

curl -X POST -H 'Content-Type: application/sparql-update'\
    --data-binary 'CLEAR GRAPH <http://example.org/optimizations/replacement>'\
    'http://localhost:7200/repositories/test/statements'
curl -X POST -H "Content-Type: application/x-trig" --data-binary '@test_modified.trig'\
    'http://localhost:7200/repositories/test/statements'

Note

The replacement optimization described here applies to all forms of transactions. i.e., it will be triggered by standard PUT requests, such as the ones in the example, but also by SPARQL INSERT queries containing the http://www.ontotext.com/replaceGraph predicate, such as <http://any/subject> <http://wwww.ontotext.com/replaceGraph> <http://example.org/graph>