Configuring a repository

Before you start adding or changing the parameters’ values, it is good to plan your repository configuration, to know what each of the parameters does, what the configuration template is and how it works, what data structures GraphDB supports, what configuration values are optimal for your set up, etc.

Plan a repository configuration

To plan your repository configuration, check out the following sections:

Configure a repository through the GraphDB Workbench

To configure a new repository, complete the repository properties form.

_images/addRepository_Free.png

Edit a repository

Some of the parameters you specify at repository creation time can be changed at any point.

  1. Click the edit icon next to a repository to edit it.
  2. Restart GraphDB for the changes to take effect.

Configure a repository programmatically

Tip

GraphDB uses a RDF4J configuration template for configuring its repositories. RDF4J keeps the repository configurations with their parameters, modelled in RDF, in the SYSTEM repository. Therefore, in order to create a new repository, the RDF4J needs such an RDF file to populate the SYSTEM repository. For more information how the configuration template works, see Repository configuration template - how it works.

To configure a new repository programmatically:

  1. Fill in the .ttl configuration template that can be found in the /templates folder of the GraphDB distribution. The parameters are described in the Configuration parameters section.

    # RDF4J configuration template for a GraphDB Free repository
    
    @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
    @prefix rep: <http://www.openrdf.org/config/repository#>.
    @prefix sr: <http://www.openrdf.org/config/repository/sail#>.
    @prefix sail: <http://www.openrdf.org/config/sail#>.
    @prefix owlim: <http://www.ontotext.com/trree/owlim#>.
    
    [] a rep:Repository ;
        rep:repositoryID "graphdb-test" ;
        rdfs:label "GraphDB Free repository" ;
        rep:repositoryImpl [
            rep:repositoryType "graphdb:FreeSailRepository" ;
            sr:sailImpl [
                sail:sailType "graphdb:FreeSail" ;
    
                owlim:base-URL "http://example.org/graphdb#" ;
                owlim:defaultNS "" ;
                owlim:entity-index-size "10000000" ;
                owlim:entity-id-size  "32" ;
                owlim:imports "" ;
                owlim:repository-type "file-repository" ;
                owlim:ruleset "rdfs-plus-optimized" ;
                owlim:storage-folder "storage" ;
    
                owlim:enable-context-index "false" ;
    
                owlim:enablePredicateList "true" ;
    
                owlim:in-memory-literal-properties "true" ;
                owlim:enable-literal-index "true" ;
    
                owlim:check-for-inconsistencies "false" ;
                owlim:disable-sameAs  "false" ;
                owlim:query-timeout  "0" ;
                owlim:query-limit-results  "0" ;
                owlim:throw-QueryEvaluationException-on-timeout "false" ;
                owlim:read-only "false" ;
                owlim:nonInterpretablePredicates "http://www.w3.org/2000/01/rdf-schema#label;http://www.w3.org/1999/02/22-rdf-syntax-ns#type;http://www.ontotext.com/owlim/ces#gazetteerConfig;http://www.ontotext.com/owlim/ces#metadataConfig" ;
            ]
        ].
    
  2. Update the RDF4J SYSTEM repository with this configuration by issuing the following on the command line (all in one line) replacing the filename (config.ttl), URL to the remote server’s SYSTEM directory (http://server1:7200/repositories/SYSTEM) and a unique context in which to put the repository configuration (http://example.com#g1):

    curl -X POST -H "Content-Type:application/x-turtle" -T config.ttl
      -d graph=http://example.com#g1
      http://server1:7200/repositories/SYSTEM/rdf-graphs/service
    
  3. Update the SYSTEM repository with a single statement to indicate that the unique context is an instance of sys:RepositoryContext:

    curl -X POST -H "Content-Type:application/x-turtle"
      -d "<http://example.com#g1> a <http://www.openrdf.org/config/repository#RepositoryContext>."
      http://server1:7200/repositories/SYSTEM/statements
    

Configuration parameters

This is a list of all repository configuration parameters. Some of the parameters can be changed (effective after a restart), some cannot be changed (the change has no effect) and others need special attention once a repository has been created, as changing them will likely lead to inconsistent data (e.g., unsupported inferred statements, missing inferred statements, or inferred statements that can not be deleted).

base-URL (Can be changed)
Description: Specifies the default namespace for the main persistence file. Non-empty namespaces are recommended, because their use guarantees the uniqueness of the anonymous nodes that may appear within the repository.
Default value: none
defaultNS (Cannot be changed)
Description: Default namespaces corresponding to each imported schema file separated by semicolon and the number of namespaces must be equal to the number of schema files from the imports parameter.
Default value: <empty>
Example: owlim:defaultNS "http://www.w3.org/2002/07/owl#;http://example.org/owlim#".

Warning

This parameter cannot be set via a command line argument.

entity-index-size (Cannot be changed by the user once initially set)
Description: Defines the initial size of the entity hash table index entries. The bigger the size, the less the collisions in the hash table and the faster the entity retrieval. The entity hash table will adapt to the number of stored entities once the number of collisions passes a critical threshold.
Default value: 10000000
entity-id-size (Cannot be changed)
Description: Defines the bit size of internal IDs used to index entities (URIs, blank nodes and literals). In most cases, this parameter can be left to its default value. However, if very large datasets containing more than 2 31 entities are used, set this parameter to 40. Be aware that this can only be set when instantiating a new repository and converting an existing repository from 32 to 40-bit entity widths is not possible.
Default value: 32
Possible values: 32 and 40
imports (Cannot be changed)
Description: A list of schema files that will be imported at start up. All the statements, found in these files, will be loaded in the repository and will be treated as read-only. The serialisation format is determined by the file extension:
  • .brf => BinaryRDF
  • .n3 => N3
  • .nq => N-Quads
  • .nt => N-Triples
  • .owl => RDF/XML
  • .rdf => RDF/XML
  • .rdfs => RDF/XML
  • .trig => TriG
  • .trix => TriX
  • .ttl => Turtle
  • .xml => TriX
Default value: none
Example: owlim:imports "./ont/owl.rdfs;./ont/ex.rdfs".

Tip

Schema files can be either a local path name, e.g., ./ontology/myfile.rdf or a URL, e.g., http://www.w3.org/2002/07/owl.rdf. If this parameter is used, the default namespace for each imported schema file must be provided using the defaultNS parameter.

repository-type (Cannot be changed)
Default value: file-repository
Possible values: file-repository, weighted-file-repository.
ruleset (Needs special attention)
Description: Sets of axiomatic triples, consistency checks and entailment rules, which determine the applied semantics.
Default value: rdfs-plus-optimized
Possible values: empty, rdfs, owl-horst, owl-max and owl2-rl and their optimised counterparts rdfs-optimized, owl-horst-optimized, owl-max-optimized and owl2-rl-optimized. A custom ruleset is chosen by setting the path to its rule file .pie.
storage-folder (Can be changed)
Description: specifies the folder where the index files will be stored.
Default value: none
enable-context-index (Can be changed)
Default value: false
Possible value: true, where GraphDB will build and use the context index.
enablePredicateList (Can be changed)
Description: Enables or disables mappings from an entity (subject or object) to its predicates; switching this on can significantly speed up queries that use wildcard predicate patterns.
Default value: false:
in-memory-literal-properties (Can be changed)
Description: Turns caching of the literal languages and data-types on and off. If the caching is on and the entity pool is restored from persistence, but there is no such cache available on disk, it is created after the entity pool initialisation.
Default value: false
enable-literal-index (Can be changed)
Description: Enables or disables the storage. The literal index is always built as data is loaded/modified. This parameter only affects whether the index is used during query-answering.
Default value: true
check-for-inconsistencies (Can be changed)
Description: Turns the mechanism for consistency checking on and off; consistency checks are defined in the rule file and are applied at the end of every transaction, if this parameter is true. If an inconsistency is detected when committing a transaction, the whole transaction will be rolled back.
Default value: false
disable-sameAs (Needs special attention)
Description: Enables or disables the owl:sameAs optimisation.
Default value: false
query-timeout (Can be changed)
Description: Sets the number of seconds after which the evaluation of a query will be terminated; values less than or equal to zero mean no limit.
Default value: 0; (no limit);
query-limit-results (Can be changed)
Description: Sets the maximum number of results returned from a query after which the evaluation of a query will be terminated; values less than or equal to zero mean no limit.
Default value: 0; (no limit);
throw-QueryEvaluationException-on-timeout (Can be changed)
Default value: false
Possible value: true; if set, a QueryEvaluationException is thrown when the duration of a query execution exceeds the time-out parameter.
read-only (Can be changed)
Description: In this mode, no modifications are allowed to the data or namespaces.
Default value: false
Possible value: true, puts the repository in to read-only mode.
Non-interpretable predicates
Description: “Colon-separated list of predicates (full URLs) that GraphDB will not try to process with the registered GraphDB plugins. (Predicates processed by registered plugins are often called “Magic” predicates). This optimization will speed up the data loading by providing a hint that these predicates are not magic.”
Default value: http://www.w3.org/2000/01/rdf-schema#label;http://www.w3.org/1999/02/22-rdf-syntax-ns#type;http://www.ontotext.com/owlim/ces#gazetteerConfig;http://www.ontotext.com/owlim/ces#metadataConfig

Configure GraphDB memory

Configure Java heap memory

The following diagram offers a view of the memory use by the GraphDB structures and processes:

_images/total_JAVA_Heap_Memory.png

To specify the maximum amount of heap space used by a JVM, use the -Xmx virtual machine parameter.

The Xmx value should be about 2/3 of the system memory. For example, if a system has 8GB total of RAM and 1GB is used by the operating system, services, etc. and 1GB by the entity pool and the hash maps, as they are off heap, ideally, the JVM that hosts the application using GraphDB should have a maximum heap size of 6GB and can be set using the JVM argument: -Xmx6g.

Single global page cache

In GraphDB 7.2, we introduce a new cache strategy called single global page cache. It means that there is one global cache shared between all internal structures of all repositories and you no longer have to configure the cache-memory, tuple-index-memory and predicate-memory, or size every repository and calculate the amount of memory dedicated to it. If one of the repositories is used more at the moment, it naturally gets more slots in the cache.

Current global cache implementation can be enabled by specifying: -Dgraphdb.global.page.cache=true -Dgraphdb.page.cache.size=3G. If you don’t specify page.cache.size but only enable the global cache, it will take 50% of the Xmx parameter.

Note

You don’t have to change/edit your repository configurations, the new cache will be used when you upgrade to the new version.

Configure Entity pool memory

From GraphDB 7.2 on, you no longer have to calculate the entity pool memory when giving the JVM max heap memory parameter to GraphDB. All entity pool structures now reside off-heap, i.e. outside of the normal JVM heap.

This means, however, that you need to leave some memory outside of the Xmx.

To activate the old behaviour, you can still enable on heap allocation with

-Dgraphdb.epool.onheap=true

If you are concerned about that the process will eat up unlimited amount of memory, you can specify a maximum size with -XX:MaxDirectMemorySize which defaults to the Xmx parameter(at least in openjdk and oracle jdk).

Sample memory configuration

This is a sample configuration demonstrating how to correctly size a GraphDB server with a single repository. The loaded dataset is estimated to 500M RDF statements and 150M unique entities. As a rule of thumb, the average number of unique entities compared to the total number of statements in a standard dataset is 1:3.

Configuration parameter Description Example value
Total OS memory Total physical system memory 16 GB
On heap JVM (-Xmx) configuration Maximum heap memory allocated by the JVM process 10 GB
page.cache.size Global single cache shared between all internal structures of all repositories (the default value is 50% of the heap size) 5 GB
Remaining on-heap memory for query execution Raw estimate of the memory for query execution; higher value is required if many long running analytical queries are expected ~4.5 GB
entity-index-size ( “Entity index size”) stored off-heap by default Size of the initial entity pool hashtable; the recommended value is equal to the total number of unique entities 150000000
Memory footprint of the entity pool stored off-heap by default Calculated from entity-index-size and total number of entities; this memory will be taken after the repository initialisation ~2.5 GB
Remaining OS memory Raw estimate of the memory left to the OS ~3.5 GB

Reconfigure a repository

Once a repository is created, it is possible to change some parameters, either by editing it in the Workbench, by changing the configuration in the SYSTEM repository or by setting a global override for a given property.

Note

When you change a repository parameter you have to restart GraphDB for the changes to take effect.

Using the Workbench

To edit a repository parameter in the GraphDB Workbench, go to Admin -> Repositories and click the edit icon for the repository whose parameters you want to edit. A form opens where you can edit them. Click the Save button to save your changes.

In the SYSTEM repository

Changing the configuration in the SYSTEM repository is generally not recommended as a simple error might corrupt your repository configuration.

The configurations are usually structured using blank node identifiers, which are always unique, so attempting to modify a statement with a blank node by using the same blank node identifier will fail. However, this can be achieved with SPARQL UPDATE using a DELETE-INSERT-WHERE command.

PREFIX sys:  <http://www.openrdf.org/config/repository#>
PREFIX sail: <http://www.openrdf.org/config/repository/sail#>
PREFIX onto: <http://www.ontotext.com/trree/owlim#>
DELETE { GRAPH ?g {?sail ?param ?old_value } }
INSERT { GRAPH ?g {?sail ?param ?new_value } }
WHERE {
  GRAPH ?g { ?rep sys:repositoryID ?id . }
  GRAPH ?g { ?rep sys:repositoryImpl ?impl . }
  GRAPH ?g { ?impl sys:repositoryType ?type . }
  GRAPH ?g { ?impl sail:sailImpl ?sail . }
  GRAPH ?g { ?sail ?param ?old_value . }
  FILTER( ?id = "repo_id" ) .
  FILTER( ?param = onto:enable-context-index ) .
  BIND( "true" AS ?new_value ) .
}

Warning

Some parameters can not be changed after a repository has been created. These either have no effect (once the relevant data structures are built, their structure can not be changed) or changing them will cause inconsistencies (these parameters affect the reasoner).

Global overrides

It is also possible to override a repository parameter for all repositories by setting a configuration or system property. Please, see Engine properties for more information.

Rename a repository

Using the workbench

Use the workbench to change the Repository ID field. It executes the following steps properly and takes care to update all places in the workbench where the repository name is used.

Editing of the SYSTEM repository

Warning

Changing the SYSTEM repository is generally not recommended as a simple error might corrupt your repository configuration.

For an existing repository that has already been used:

  1. Restart GraphDB to ensure that the repository is not loaded into memory (with locked/open files).

  2. Select the SYSTEM repository.

  3. Execute the following SPARQL update with the appropriate old and new names substituted in the last two lines.

    PREFIX sys:<http://www.openrdf.org/config/repository#>
    DELETE { GRAPH ?g { ?repository sys:repositoryID ?old_name } }
    INSERT { GRAPH ?g { ?repository sys:repositoryID ?new_name } }
    WHERE {
      GRAPH ?g { ?repository a sys:Repository . }
      GRAPH ?g { ?repository sys:repositoryID ?old_name . }
      FILTER( ?old_name = "old_repository_name" ) .
      BIND( "new_repository_name" AS ?new_name ) . }
    
  4. Rename the folder for this repository in the file system.

    Please refer to Configuring the GraphDB data directory for more information on how to find the location of your repositories on the disk.

    Note

    There is another consideration regarding the storage folder http://www.ontotext.com/trree/owlim#storage-folder
    If it is set to an absolute pathname and moving the repository requires an update of this parameter as well, you will need the value of this parameter (with the new name).