# Configuring a Repository¶

Before you start adding or changing the parameter values, it is good to plan your repository configuration, to know what each of the parameters does, what the configuration template is and how it works, what data structures GraphDB supports, what configuration values are optimal for your set up, etc.

## Plan a repository configuration¶

To plan your repository configuration, check out the following sections:

## Configure a repository through the GraphDB Workbench¶

To configure a new repository, complete the repository properties form.

Note

If you need a repository with enabled SHACL validation, you must enable this option at configuration time. SHACL validation cannot be enabled after the repository has been created.

## Edit a repository¶

Some of the parameters you specify at repository creation time can be changed at any point.

1. Click the Edit icon next to a repository to edit it.

2. Restart GraphDB for the changes to take effect.

Package

com.ontotext

MBean name

OwlimRepositoryManager

Operation

Description

shutdownRepositoryInstance

Shuts down the repository instance.

## Configure a repository programmatically¶

Tip

GraphDB uses an RDF4J configuration template for configuring its repositories. RDF4J keeps the repository configurations with their parameters, modeled in RDF. Therefore, in order to create a new repository, the RDF4J needs such an RDF file. For more information on how the configuration template works, see Repository configuration template - how it works.

To configure a new repository programmatically:

1. Fill in the .ttl configuration template that can be found in the /configs/templates folder of the GraphDB distribution. The parameters are described in the Configuration parameters section.

# RDF4J configuration template for a GraphDB Free repository

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix rep: <http://www.openrdf.org/config/repository#>.
@prefix sr: <http://www.openrdf.org/config/repository/sail#>.
@prefix sail: <http://www.openrdf.org/config/sail#>.
@prefix owlim: <http://www.ontotext.com/trree/owlim#>.

[] a rep:Repository ;
rep:repositoryID "graphdb-test" ;
rdfs:label "GraphDB Free repository" ;
rep:repositoryImpl [
rep:repositoryType "graphdb:FreeSailRepository" ;
sr:sailImpl [
sail:sailType "graphdb:FreeSail" ;

owlim:base-URL "http://example.org/graphdb#" ;
owlim:defaultNS "" ;
owlim:entity-index-size "10000000" ;
owlim:entity-id-size  "32" ;
owlim:imports "" ;
owlim:repository-type "file-repository" ;
owlim:ruleset "rdfsplus-optimized" ;
owlim:storage-folder "storage" ;

owlim:enable-context-index "false" ;

owlim:enablePredicateList "true" ;

owlim:in-memory-literal-properties "true" ;
owlim:enable-literal-index "true" ;

owlim:check-for-inconsistencies "false" ;
owlim:disable-sameAs  "false" ;
owlim:query-timeout  "0" ;
owlim:query-limit-results  "0" ;
owlim:throw-QueryEvaluationException-on-timeout "false" ;
]
].

2. Use this command to create a repo from a config.ttl:

curl -X POST --header "Content-Type:multipart/form-data" -F "config=@./config.ttl"
"http://localhost:7200/rest/repositories"


## Configuration parameters¶

This is a list of all repository configuration parameters. Some of the parameters can be changed (effective after a restart), some cannot be changed (the change has no effect) and others need special attention once a repository has been created, as changing them will likely lead to inconsistent data (e.g., unsupported inferred statements, missing inferred statements, or inferred statements that cannot be deleted).

base-URL
Description: Specifies the default namespace for the main persistence file. Non-empty namespaces are recommended, because their use guarantees the uniqueness of the anonymous nodes that may appear within the repository.
Default value: none
Can be changed.
check-for-inconsistencies (see more)
Description: Enables or disables the mechanism for consistency checking. If this parameter is true, consistency checks are defined in the rule file and applied at the end of every transaction. If an inconsistency is detected while committing a transaction, the whole transaction will be rolled back.
Default value: false
Can be changed.
defaultNS
Description: Default namespaces corresponding to each imported schema file, separated by semicolon. The number of namespaces must be equal to the number of schema files from the imports parameter.
Default value: <empty>
Example: owlim:defaultNS "http://www.w3.org/2002/07/owl#;http://example.org/owlim#".
Cannot be changed.

Warning

This parameter cannot be set via a command line argument.

disable-sameAs (see more)
Description: Enables or disables the owl:sameAs optimization.
Default value: true.
Can change in the UI depending on the ruleset.

Warning

This parameter needs special attention.

enable-context-index (see more)
Default value: false
Possible value: true, where GraphDB will build and use the context index.
Can be changed.
enable-literal-index (see more)
Description: Enables or disables the storage. The literal index is always built as data is loaded/modified. This parameter only affects whether the index is used during query answering.
Default value: true
Can be changed.
enablePredicateList (see more)
Description: Enables or disables mappings from an entity (subject or object) to its predicates; enabling it can significantly speed up queries that use wildcard predicate patterns.
Default value: true
Can be changed.
entity-id-size
Description: Defines the bit size of internal IDs used to index entities (URIs, blank nodes, literals, and RDF* embedded triples). In most cases, this parameter can be left at its default value. However, if using very large datasets containing over 2 31 entities, set this parameter to 40. Be aware that this can only be set when instantiating a new repository, and that converting an existing repository from 32 to 40-bit entity widths is not possible.
Default value: 32
Possible values: 32 and 40
Cannot be changed.
entity-index-size (see more)
Description: Defines the initial size of the entity hash table index entries. The bigger the size, the fewer the collisions in the hash table, and the faster the entity retrieval. The entity hash table will adapt to the number of stored entities once the number of collisions passes a critical threshold.
Default value: 10,000,000

Warning

This parameter cannot be changed by the user, once initially set.

imports
Description: A list of schema files that will be imported at startup. All statements found in these files will be loaded in the repository and will be treated as read-only. The serialization format is determined by the file extension:
• .brf => BinaryRDF

• .n3 => N3

• .nq => N-Quads

• .nt => N-Triples

• .owl => RDF/XML

• .rdf => RDF/XML

• .rdfs => RDF/XML

• .trig => TriG

• .trix => TriX

• .ttl => Turtle

• .xml => TriX

Default value: none
Example: owlim:imports "./ont/owl.rdfs;./ont/ex.rdfs".
Cannot be changed.

Tip

Schema files can be either a local path name, e.g., ./ontology/myfile.rdf or a URL, e.g., http://www.w3.org/2002/07/owl.rdf. If this parameter is used, the default namespace for each imported schema file must be provided using the defaultNS parameter.

in-memory-literal-properties (see more)
Description: Enables or disables caching of the literal languages and data types. If the caching is on and the entity pool is restored from persistence, but there is no such cache available on disk, it is created after the entity pool initialization.
Default value: true
Can be changed.
nonInterpretablePredicates
Description: Colon-separated list of predicates (full URLs) that GraphDB will not try to process with the registered GraphDB plugins. (Predicates processed by registered plugins are often called “Magic” predicates). This optimization will speed up the data loading by providing a hint that these predicates are not magic.
Default value: http://www.w3.org/2000/01/rdf-schema#label;http://www.w3.org/1999/02/22-rdf-syntax-ns#type;http://www.ontotext.com/owlim/ces#gazetteerConfig;http://www.ontotext.com/owlim/ces#metadataConfig
query-limit-results
Description: Sets the maximum number of results returned from a query after which the evaluation of a query will be terminated; values less than or equal to zero mean no limit.
Default value: 0; (no limit);
Can be changed.
query-timeout (see more)
Description: Sets the number of seconds after which the evaluation of a query will be terminated; values less than or equal to zero mean no limit.
Default value: 0; (no limit);
Can be changed.
read-only
Description: In this mode, no modifications are allowed to the data or namespaces.
Default value: false
Possible value: true, puts the repository in read-only mode.
Can be changed.
repository-type
Default value: file-repository
Possible values: file-repository, weighted-file-repository.
Cannot be changed.
ruleset (see more)
Description: Sets of axiomatic triples, consistency checks and entailment rules, which determine the applied semantics.
Default value: rdfs-plus-optimized
Possible values: empty, rdfs, owl-horst, owl-max and owl2-rl and their optimized counterparts rdfs-optimized, owl-horst-optimized, owl-max-optimized and owl2-rl-optimized. A custom ruleset is chosen by setting the path to its rule file .pie.

Warning

This parameter needs special attention.

storage-folder (see more)
Description: specifies the folder where the index files will be stored.
Default value: none
Can be changed.
throw-QueryEvaluationException-on-timeout
Default value: false
Possible value: true; if set, a QueryEvaluationException is thrown when the duration of a query execution exceeds the timeout parameter.
Can be changed.

## Configure GraphDB memory¶

### Configure Java heap memory¶

The following diagram offers a view of the memory use by the GraphDB structures and processes:

To specify the maximum amount of heap space used by a JVM, use the -Xmx virtual machine parameter.

As a general rule, the -Xmx value should not exceed 2/3 of the system memory. This means that if you have a system with a total of 8 gigabytes RAM, where 1 gigabyte is used by the operating system, services, etc., and 1 gigabyte by the entity pool and the hash maps. As they are off-heap, the JVM that hosts the application using GraphDB should, ideally, have a maximum heap size of 6 gigabytes, and can be set using the JVM argument -Xmx6g.

### Single global page cache¶

GraphD’s cache strategy, the single global page cache, employs the concept of one global cache shared between all internal structures of all repositories. This way, you no longer have to configure the cache-memory, tuple-index-memory and predicate-memory, or size every repository and calculate the amount of memory dedicated to it. If at a given moment one of the repositories is being used more, it will naturally get more slots in the cache.

The current global cache implementation can be enabled by specifying: -Dgraphdb.global.page.cache=true -Dgraphdb.page.cache.size=3G. If you do not specify page.cache.size but only enable the global cache, it will take 50% of the -Xmx parameter.

Note

You do not have to change/edit your repository configurations. The new cache will be used when you upgrade to the new version.

### Configure Entity pool memory¶

By default, all entity pool structures are residing off-heap, i.e., outside of the normal JVM heap. This way, you do not have to calculate the entity pool memory when giving the JVM max heap memory parameter to GraphDB. This means, however, that you need to leave some memory outside of the -Xmx.

To activate the old behavior, you can still enable on-heap allocation with -Dgraphdb.epool.onheap=true.

If you are concerned that the process will eat up unlimited amount of memory, you can specify a maximum size with -XX:MaxDirectMemorySize, which defaults to the -Xmx parameter (at least in OpenJDK and Oracle JDK).

### Sample memory configuration¶

This is a sample configuration demonstrating how to correctly size a GraphDB server with a single repository. The loaded dataset is estimated to 500 million RDF statements and 150 million unique entities. As a rule of thumb, the average number of unique entities compared to the total number of statements in a standard dataset is 1:3.

Configuration parameter

Description

Example value

Total OS memory

Total physical system memory

16 GB

On-heap JVM (-Xmx) configuration

Maximum heap memory allocated by the JVM process

10 GB

page.cache.size

Global single cache shared between all internal structures of all repositories (the default value is 50% of the heap size)

5 GB

Remaining on-heap memory for query execution

Raw estimate of the memory for query execution; a higher value is required if many, long running analytical queries are expected

~4.5 GB

entity-index-size ( “Entity index size”) stored off-heap by default

Size of the initial entity pool hash table; the recommended value is equal to the total number of unique entities

150,000,000

Memory footprint of the entity pool stored off-heap by default

Calculated from entity-index-size and total number of entities; this memory will be taken after the repository initialization

~2.5 GB

Remaining OS memory

Raw estimate of the memory left to the OS

~3.5 GB

### Upper bounds for the memory consumed by the GraphDB process¶

In order to make sure that no OutOfMemoryExceptions are thrown while working with an active GraphDB repository, you need to set an upper bound value for the memory consumed by all instances of the tupleSet/distinct collections. This is done with the -Ddefault.min.distinct.threshold parameter, whose default value is 250m and can be changed. If this value is surpassed, a QueryEvaluationException is thrown so as to avoid running out of memory due to hungry distinct/group by operation.

## Reconfigure a repository¶

Once a repository is created, it is possible to change some parameters, either by editing it in the Workbench or by setting a global override for a given property.

Note

When you change a repository parameter, you need to restart GraphDB for the changes to take effect.

### Using the Workbench¶

To edit a repository parameter in the GraphDB Workbench, go to Setup -> Repositories and click the Edit icon for the repository whose parameters you want to edit.

### Global overrides¶

It is also possible to override a repository parameter for all repositories by setting a configuration or system property. See Engine properties for more details on how to do it.

## Rename a repository¶

### Using the Workbench¶

Use the Workbench to change the repository ID. This will update all locations in the Workbench where the repository name is used.