Nested repositories

What are nested repositories

Nested repositories is a technique for sharing RDF data between multiple GraphDB repositories. It is most useful when several logically independent repositories need to make use of a large (reference) dataset, e.g., a combination of one or more LOD datasets (GeoNames, DBpedia, MusicBrainz, etc.), but where each repository adds its own specific data. This mechanism allows the data in the common repository to be logically included, or ‘nested’, within other repositories that extend it. A repository that is nested in another repository (possibly into more than one other repository) is called a ‘parent repository’ while a repository that nests a parent repository is called a ‘child repository’. The RDF data in the common repository is combined with the data in each child repository for inference purposes. Changes in the common repository are reflected across all child repositories and inferences are maintained to be logically consistent.

Results for queries against a child repository are computed from the contents of the child repository, as well as the nested repository. The following diagram illustrates the nested repositories concept:

_images/nested_repositories_diagram1.jpg

Note

When two child repositories extend the same nested repository, they remain logically separate. Only changes made to the common nested repository will affect any child repositories.

Inference, indexing and queries

A child repository ignores all values for its ruleset configuration parameter and automatically uses the same ruleset as its parent repository. Child repositories compute inferences based on the union of the explicit data stored in the child and parent repositories. Changes to either parent or child cause the set of inferred statements in the child to be updated.

Note

The child repository must be initialised (running) when updates to the parent repository take place, otherwise the child can become logically inconsistent.

When a parent repository is updated, before its transaction is committed, it updates every connected child repository by a set of statement INSERT/DELETE operations. When a child repository is updated, any new resources are recorded in the parent dictionary so that the same resource is indexed in the sibling child repositories using the same internal identifier.

Note

A current limitation on the implementation is that no updates using the owl:sameAs predicate are permitted.

Queries executed on a child repository should perform almost as well as queries executed against a repository containing all the data (from both parent and child repositories).

Configuration

Both parent and child repositories must be deployed using Tomcat and they must deployed to the same instance on the same machine (same JVM).

Repositories that are configured to use the nesting mechanism must be created using specific RDF4J SAIL types:

  • owlim:ParentSail - for parent (shared) repositories;
  • owlim:ChildSail - for child repositories that extend parent repositories.

(Where the owlim namespace above maps to http://www.ontotext.com/trree/owlim#.)

Additional configuration parameters:

  • owlim:id is used in the parent configuration to provide a nesting name;
  • owlim:parent-id is used in the child configurations to identify the parent repository.

Once created, a child repository must not be reconfigured to use a different parent repository as this leads to inconsistent data.

Note

When setting up several GraphDB instances to run in the same Java Virtual Machine, i.e., the JVM used to host Tomcat, make sure that the configured memory settings take into account the other repositories. For example, if setting up 3 GraphDB instances, configure them as though each of them had only one third of the total Java heap space available.

Initialisation and shut down

To initialise nested repositories correctly, start the parent repository followed by each of its children.

As long as no further updates occur, the shutdown sequence is not strictly defined. However, we recommend that you shut down the children first followed by the parent.