Configuring a Repository¶
Before you start adding or changing the parameter values, we recommend planning your repository configuration and familiarizing yourself with what each of the parameters does, what the configuration template is and how it works, what data structures GraphDB supports, what configuration values are optimal for your setup, etc.
What’s in this document?
Plan a repository configuration¶
To plan your repository configuration, check out the following sections:
Configure a repository through the GraphDB Workbench¶
To configure a new repository, complete its properties form.
Note
If you need a repository with enabled SHACL validation, you must enable this option at configuration time. SHACL validation cannot be enabled after the repository has been created.
Edit a repository¶
Some of the parameters you specify at repository creation time can be changed at any point.
Click the Edit icon next to a repository to edit it.
Restart GraphDB for the changes to take effect.
Configure a repository programmatically¶
Tip
GraphDB uses an RDF4J configuration template for configuring its repositories. RDF4J keeps the repository configurations with their parameters, modeled in RDF. Therefore, in order to create a new repository, the RDF4J needs such an RDF file. For more information on how the configuration template works, see Repository configuration template - how it works.
To configure a new repository programmatically:
Fill in the
.ttl
configuration template that can be found in the/configs/templates
folder of the GraphDB distribution. The parameters are described in the Configuration parameters section.# RDF4J configuration template for a GraphDB repository @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>. @prefix rep: <http://www.openrdf.org/config/repository#>. @prefix sr: <http://www.openrdf.org/config/repository/sail#>. @prefix sail: <http://www.openrdf.org/config/sail#>. @prefix graphdb: <http://www.ontotext.com/config/graphdb#>. [] a rep:Repository ; rep:repositoryID "gdb" ; rdfs:label "" ; rep:repositoryImpl [ rep:repositoryType "graphdb:SailRepository" ; sr:sailImpl [ sail:sailType "graphdb:Sail" ; graphdb:base-URL "http://example.org/" ; graphdb:defaultNS "" ; graphdb:entity-index-size "10000000" ; graphdb:entity-id-size "32" ; graphdb:imports "" ; graphdb:repository-type "file-repository" ; graphdb:ruleset "owl-horst-optimized" ; graphdb:storage-folder "storage" ; graphdb:enable-context-index "false" ; graphdb:enablePredicateList "true" ; graphdb:in-memory-literal-properties "true" ; graphdb:enable-literal-index "true" ; graphdb:check-for-inconsistencies "false" ; graphdb:disable-sameAs "true" ; graphdb:query-timeout "0" ; graphdb:query-limit-results "0" ; graphdb:throw-QueryEvaluationException-on-timeout "false" ; graphdb:read-only "false" ; ] ].
To configure a SHACL validation enabled repository programmatically, do the same as above, but with the added SHACL parameters:
# RDF4J configuration template for a GraphDB repository with SHACL validation support @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>. @prefix rep: <http://www.openrdf.org/config/repository#>. @prefix sr: <http://www.openrdf.org/config/repository/sail#>. @prefix sail: <http://www.openrdf.org/config/sail#>. @prefix graphdb: <http://www.ontotext.com/config/graphdb#>. @prefix shacl: <http://rdf4j.org/config/sail/shacl#>. [] a rep:Repository ; rep:repositoryID "graphdb-preload-test" ; rdfs:label "GraphDB repository" ; rep:repositoryImpl [ rep:repositoryType "graphdb:SailRepository" ; sr:sailImpl [ sail:sailType "rdf4j:ShaclSail"; shacl:validationEnabled "true" ; shacl:logValidationPlans "false" ; shacl:logValidationViolations "false" ; shacl:parallelValidation "true" ; shacl:globalLogValidationExecution "false" ; shacl:cacheSelectNodes "true" ; shacl:undefinedTargetValidatesAllSubjects "false" ; shacl:ignoreNoShapesLoadedException "false" ; shacl:performanceLogging "false" ; shacl:rdfsSubClassReasoning "true" ; shacl:eclipseRdf4jShaclExtensions "true" ; shacl:dashDataShapes "true" ; shacl:validationResultsLimitPerConstraint "-1"^^xsd:long; shacl:validationResultsLimitTotal "-1"^^xsd:long; sail:delegate [ graphdb:base-URL "http://example.org/"; graphdb:check-for-inconsistencies "false"; graphdb:defaultNS ""; graphdb:disable-sameAs "true"; graphdb:enable-context-index "false"; graphdb:enable-literal-index "true"; graphdb:enablePredicateList "true"; graphdb:entity-id-size "32"; graphdb:entity-index-size "10000000"; graphdb:imports ""; graphdb:in-memory-literal-properties "true"; graphdb:query-limit-results "0"; graphdb:query-timeout "0"; graphdb:read-only "false"; graphdb:repository-type "file-repository"; graphdb:ruleset "rdfsplus-optimized"; graphdb:storage-folder "storage"; graphdb:throw-QueryEvaluationException-on-timeout "false"; sail:sailType "graphdb:Sail" ] ] ].
Use this command to create a repo from the
config.ttl
:curl -X POST --header "Content-Type:multipart/form-data" -F "config=@./config.ttl" "http://localhost:7200/rest/repositories"
Configuration parameters¶
This is a list of all repository configuration parameters. Some of the parameters can be changed (effective after a restart), some cannot be changed (the change has no effect) and others need special attention once a repository has been created, as changing them will likely lead to inconsistent data (e.g., unsupported inferred statements, missing inferred statements, or inferred statements that cannot be deleted).
Parameter name |
Description |
Default value |
---|---|---|
|
Specifies the default namespace for the main persistence file. Non-empty namespaces are recommended, because their use guarantees the uniqueness of the anonymous nodes that may appear within the repository. |
none Can be changed.
|
check-for-inconsistencies |
Enables or disables the mechanism for consistency checking.
If this parameter is |
false Can be changed.
|
|
Default namespaces corresponding to each imported schema file,
separated by semicolon. The number of namespaces must be equal to the
number of schema files from the imports parameter. Example:
Warning: This parameter cannot be set via a command line argument.
|
<empty> Cannot be changed.
|
disable-sameAs |
Enables or disables the Warning: This parameter needs special attention.
|
true Can change in the UI depending
on the ruleset.
|
enable-context-index |
Possible value: |
false Can be changed.
|
enable-literal-index |
Enables or disables the storage. The literal index is always built as data is loaded/modified. This parameter only affects whether the index is used during query answering. |
true Can be changed.
|
enablePredicateList |
Enables or disables mappings from an entity (subject or object) to its predicates; enabling it can significantly speed up queries that use wildcard predicate patterns. |
true Can be changed.
|
|
Defines the bit size of internal IDs used to index entities
this parameter can be left at its default value. However, if using very
large datasets containing over 2 31 entities,
set this parameter to Possible values:
32 and 40 . |
32 Cannot be changed.
|
entity-index-size |
Warning: Once initially set, this parameter cannot be changed
by the user.
|
|
imports Tip: Schema files can be either a local path
name, e.g.,
./ontology/myfile.rdf or a URL,
e.g., http://www.w3.org/2002/07/owl.rdf .
If this parameter is used, the default
namespace for each imported schema file
must be provided using the
defaultNS parameter. |
A list of schema files that will be imported at startup. All statements found in these files will be loaded in the repository and will be treated as read-only. The serialization format is determined by the file extension:
Example:
graphdb:imports "./ont/owl.rdfs;./ont/ex.rdfs" |
none Cannot be changed.
|
in-memory-literal-properties |
Enables or disables caching of the literal languages and data types. If the caching is on and the entity pool is restored from persistence, but there is no such cache available on disk, it is created after the entity pool initialization. |
true Can be changed.
|
|
Colon-separated list of predicates (full URLs) that GraphDB will not try to process with the registered GraphDB plugins. (Predicates processed by registered plugins are often called “Magic” predicates). This optimization will speed up the data loading by providing a hint that these predicates are not magic. |
http://www.w3.org/2000/01/rdf-
schema#label;http://www.w3.org/
1999/02/22-rdf-syntax-ns#type;
http://www.ontotext.com/owlim/
ces#gazetteerConfig;http:
//www.ontotext.com/owlim/ces
#metadataConfig
|
|
Sets the maximum number of results returned from a query after which he evaluation of a query will be terminated; values less than or equal to zero mean no limit. |
0 ; (no limit)Can be changed.
|
query-timeout |
Sets the number of seconds after which the evaluation of a query will be terminated; values less than or equal to zero mean no limit. |
0 ; (no limit)Can be changed.
|
|
In this mode, no modifications to the data or namespaces are allowed. Possible value:
true , puts the repository in read-only mode. |
false Can be changed.
|
|
In this mode, no modifications to the data or namespaces are allowed. Possible values:
file-repository , weighted-file-repository . |
file-repository Cannot be changed.
|
ruleset |
Sets of axiomatic triples, consistency checks and entailment rules, which determine the applied semantics. Possible values:
empty , rdfs , owl-horst , owl-max , and
owl2-rl , and their optimized counterparts rdfs-optimized ,
owl-horst-optimized , owl-max-optimized , and owl2-rl-optimized .
A custom ruleset is chosen by setting the path to its rule file .pie .Warning: This parameter needs special attention.
|
|
storage-folder |
Specifies the folder where the index files will be stored. |
none Can be changed.
|
|
Possible value: |
false Can be changed.
|
Configure GraphDB memory¶
Configure Java heap memory¶
The following diagram offers a view of the memory use by the GraphDB structures and processes:

To specify the maximum amount of heap space used by a JVM, use the -Xmx
virtual machine parameter.
As a general rule, the -Xmx
value should not exceed 2/3 of the system memory. This means that if you have a system with a total of 8 gigabytes RAM, where 1 gigabyte is used by the operating system, services, etc., and 1 gigabyte by the entity pool and the hash maps. As they are off-heap, the JVM that hosts the application using GraphDB should, ideally, have a maximum heap size of 6 gigabytes, and can be set using the JVM argument -Xmx6g
.
Single global page cache¶
GraphDB’s cache strategy, the single global page cache, employs the concept of one global cache shared between all internal structures of all repositories. This way, you no longer have to configure the cache-memory
, tuple-index-memory
and predicate-memory
, or size every repository and calculate the amount of memory dedicated to it. If at a given moment one of the repositories is being used more, it will naturally get more slots in the cache.
The current global cache implementation can be enabled by specifying:
-Dgraphdb.global.page.cache=true -Dgraphdb.page.cache.size=3G
.
If you do not specify graphdb.page.cache.size
but only enable the global cache, it will take 50% of the -Xmx
parameter.
Note
You do not have to change/edit your repository configurations. The new cache will be used when you upgrade to the new version.
Configure Entity pool memory¶
By default, all entity pool structures are residing off-heap, i.e., outside of the normal JVM heap. This way, you do not have to calculate the entity pool memory when
giving the JVM max heap memory parameter to GraphDB. This means, however, that you need to leave some memory outside of the -Xmx
.
To activate the old behavior, you can still enable on-heap allocation with -Dgraphdb.epool.onheap=true
.
If you are concerned that the process will eat up unlimited amount of memory, you can specify a maximum size with
-XX:MaxDirectMemorySize
, which defaults to the -Xmx
parameter (at least in OpenJDK and Oracle JDK).
Sample memory configuration¶
This is a sample configuration demonstrating how to correctly size a GraphDB server with a single repository. The loaded dataset is estimated to 500 million RDF statements and 150 million unique entities. As a rule of thumb, the average number of unique entities compared to the total number of statements in a standard dataset is 1:3.
Configuration parameter |
Description |
Example value |
---|---|---|
Total OS memory |
Total physical system memory |
16 GB |
On-heap JVM (-Xmx) configuration |
Maximum heap memory allocated by the JVM process |
10 GB |
|
Global single cache shared between all internal structures of all repositories (the default value is 50% of the heap size) |
5 GB |
Remaining on-heap memory for query execution |
Raw estimate of the memory for query execution; a higher value is required if many, long running analytical queries are expected |
~4.5 GB |
|
Size of the initial entity pool hash table; the recommended value is equal to the total number of unique entities |
150,000,000 |
Memory footprint of the entity pool stored off-heap by default |
Calculated from |
~2.5 GB |
Remaining OS memory |
Raw estimate of the memory left to the OS |
~3.5 GB |
Upper bounds for the memory consumed by the GraphDB process¶
In order to make sure that no OutOfMemoryExceptions are thrown while working with an active GraphDB repository, you need to set an upper bound value for the memory consumed by all instances of the tupleSet/distinct
collections. This is done with the -Ddefault.min.distinct.threshold
parameter, whose default value is 250m and can be changed. If this value is surpassed, a QueryEvaluationException is thrown so as to avoid running out of memory due to hungry distinct/group
by operation.
Reconfigure a repository¶
Once a repository is created, it is possible to change some parameters, either by editing it in the Workbench or by setting a global override for a given property.
Note
When you change a repository parameter, you need to restart GraphDB for the changes to take effect.
Using the Workbench¶
To edit a repository parameter in the GraphDB Workbench, go to parameters you want to edit.
and click the Edit icon for the repository whoseGlobal overrides¶
It is also possible to override a repository parameter for all repositories by setting a configuration or system property. See Engine properties for more details on how to do it.
Rename a repository¶
Using the Workbench¶
Use the Workbench to change the repository ID. This will update all locations in the Workbench where the repository name is used.