GraphDB Free 7.0
Table of contents
- General
- Quick start guide
- Installation
- Administration
- Administration tasks
- Administration tools
- Creating a repository
- Configuring a repository
- Sizing guidelines
- Disk space requirements
- Configuring the Entity Pool
- Managing repositories
- Access rights and security
- Backing up and recovering a repository
- Query monitoring and termination
- Database health checks
- Diagnosing and reporting critical errors
- Usage
- Tools
- References
- Release notes
- FAQ
- Support
GraphDB Free 7.0
Table of contents
- General
- Quick start guide
- Installation
- Administration
- Administration tasks
- Administration tools
- Creating a repository
- Configuring a repository
- Sizing guidelines
- Disk space requirements
- Configuring the Entity Pool
- Managing repositories
- Access rights and security
- Backing up and recovering a repository
- Query monitoring and termination
- Database health checks
- Diagnosing and reporting critical errors
- Usage
- Tools
- References
- Release notes
- FAQ
- Support
Configuring a repository¶
GraphDB uses a Sesame configuration
template
for configuring its repositories. Sesame 2.0 keeps the repository
configurations with their parameters, modelled in RDF, in the SYSTEM
repository. Therefore, in order to create a new repository, the Sesame
needs such an RDF file to populate the SYSTEM
repository.
The GraphDB repository configuration templates are simple .ttl
files in
the /templates
folder of the GraphDB distribution. They can be used if
you want to configure the repositories programmatically, otherwise you
can use the GraphDB Workbench.
Tip
For hints related to rule-sets and reasoning, see Rules optimisations.
Steps¶
To configure a GraphDB repository, follow the steps:
- Check the Sizing guidelines section.
- Check the Disk space requirements section.
- Use the configuration spreadsheet from the
doc
folder of your distribution to calculate what you need for your setup. - Check all GraphDB Configuration parameters - their descriptions as well as their default and allowed values.
- Check the Configuring memory section.
- Change the default values of the repository properties, when creating it.
For creating repositories, see Creating a repository.
A repository configuration template - how it works¶
The diagram below provides an illustration of an RDF graph that describes a repository configuration:

Often, it is helpful to ensure that a repository starts with a
predefined set of RDF statements - usually one or more schema graphs.
This is possible by using the owlim:imports
property. After start
up, these files are parsed and their contents are permanently added to
the repository.
In short, the configuration is an RDF graph, where the root node is of
rdf:type rep:Repository
, and it must be connected through the
rep:RepositoryID
property to a Literal that contains the human
readable name of the repository. The root node must be connected via the
rep:repositoryImpl
property to a node that describes the
configuration.
The type of the repository is defined via the rep:repositoryType
property and its value must be graphdb:FreeSailRepository
to allow for
custom Sail implementations (such as GraphDB) to be used in Sesame 2.0.
Then, a node that specifies the Sail implementation to be instantiated
must be connected through the sr:sailImpl
property. To instantiate
GraphDB, this last node must have a property sail:sailType
with the
value graphdb:FreeSail
- the Sesame framework will locate the correct
SailFactory
within the application classpath
that will be used
to instantiate the Java implementation class.
The namespaces corresponding to the prefixes used in the above paragraph are as follows:
rep: <http://www.openrdf.org/config/repository#>
sr: <http://www.openrdf.org/config/repository/sail#>
sail: <http://www.openrdf.org/config/sail#>
owlim: <http://www.ontotext.com/trree/owlim#>
All properties used to specify the GraphDB configuration parameters use the
owlim:prefix
and the local names match up with the
Configuration parameters, e.g., the value of the
ruleset
parameter can be specified using the
http://www.ontotext.com/trree/owlim#ruleset
property.
Sample configuration¶
The following is an example configuration (in Turtle RDF format) of a Sesame 2 repository that uses a GraphDB Sail implementation:
# Sesame configuration template for a GraphDB Free repository
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix rep: <http://www.openrdf.org/config/repository#>.
@prefix sr: <http://www.openrdf.org/config/repository/sail#>.
@prefix sail: <http://www.openrdf.org/config/sail#>.
@prefix owlim: <http://www.ontotext.com/trree/owlim#>.
[] a rep:Repository ;
rep:repositoryID "graphdb-test" ;
rdfs:label "GraphDB Free repository" ;
rep:repositoryImpl [
rep:repositoryType "graphdb:FreeSailRepository" ;
sr:sailImpl [
sail:sailType "graphdb:FreeSail" ;
owlim:base-URL "http://example.org/graphdb#" ;
owlim:defaultNS "" ;
owlim:entity-index-size "10000000" ;
owlim:entity-id-size "32" ;
owlim:imports "" ;
owlim:repository-type "file-repository" ;
owlim:ruleset "owl-horst-optimized" ;
owlim:storage-folder "storage" ;
owlim:enable-context-index "false" ;
owlim:cache-memory "256m" ;
owlim:tuple-index-memory "224m" ;
owlim:enablePredicateList "true" ;
owlim:predicate-memory "32m" ;
owlim:in-memory-literal-properties "true" ;
owlim:enable-literal-index "true" ;
owlim:index-compression-ratio "-1" ;
owlim:check-for-inconsistencies "false" ;
owlim:disable-sameAs "false" ;
owlim:enable-optimization "true" ;
owlim:transaction-mode "safe" ;
owlim:transaction-isolation "true" ;
owlim:query-timeout "0" ;
owlim:query-limit-results "0" ;
owlim:throw-QueryEvaluationException-on-timeout "false" ;
owlim:useShutdownHooks "true" ;
owlim:read-only "false" ;
]
].
Configuration parameters¶
This is a list of all repository configuration parameters. Some of the parameters can be changed (effective after a restart), some cannot be changed (the change has no effect) and others must NOT be changed once a repository has been created as doing so will likely lead to inconsistent data (e.g., unsupported inferred statements, missing inferred statements, or inferred statements that can not be deleted).
repository-type
(Cannot be changed)- Default value:
file-repository
Possible values:file-repository
,weighted-file-repository
. - storage-folder (Can be changed)
- Description: specifies the folder where the index files will be stored.Default value:
none
- ruleset (Must NOT be changed)
- Description: Sets of axiomatic triples, consistency checks and entailment rules, which determine the applied semantics.Default value:
owl-horst-optimized
Possible values:empty
,rdfs
,owl-horst
,owl-max
andowl2-rl
and their optimised counterpartsrdfs-optimized
,owl-horst-optimized
,owl-max-optimized
andowl2-rl-optimized
. A custom ruleset is chosen by setting the path to its rule file.pie
. base-URL
(Can be changed)- Description: Specifies the default namespace for the main persistence file. Non-empty namespaces are recommended, because their use guarantees the uniqueness of the anonymous nodes that may appear within the repository.Default value:
none
- entity-index-size (Cannot be changed)
- Description: Defines the number of entity hash table index entries. The bigger the size, the less the collisions in the hash table and the faster the entity retrieval. The entity hash table does not rehash, so its index size is constant throughout the life of the repository.Default value:
10000000
- cache-memory (Can be changed)
- Description: Specifies the total amount of memory to be given to all types of cache.Default value:
<none>
- check-for-inconsistencies (Can be changed)
- Description: Turns the mechanism for consistency checking on and off; consistency checks are defined in the rule file and are applied at the end of every transaction, if this parameter is
true
. If an inconsistency is detected when committing a transaction, the whole transaction will be rolled back.Default value:false
defaultNS
(Cannot be changed)- Description: Default namespaces corresponding to each imported schema file separated by semicolon and the number of namespaces must be equal to the number of schema files from the
imports
parameter.Default value:<empty>
Example:owlim:defaultNS "http://www.w3.org/2002/07/owl#;http://example.org/owlim#"
.Warning
This parameter cannot be set via a command line argument.
- disable-sameAs (Must NOT be changed)
- Description: Enables or disables the
owl:sameAs
optimisation.Default value:false
- enable-context-index (Can be changed)
- Default value:
false
Possible value:true
, where GraphDB will build and use the context index/indices. - enable-literal-index (Can be changed)
- Description: Enables or disables the Storage. The literal index is always built as data is loaded/modified. This parameter only affects whether the index is used during query-answering.Default value:
true
- enable-optimization (Can be changed)
- Description: Enables or disables query optimisation.Default value:
true
Warning
Disabling query optimisation is rarely needed - usually only for debugging purposes. Also, be aware that disabling query optimisation will also disable the correct behaviour of plugins (Full-text search, Geo-spatial extensions, RDF Rank, etc).
- enablePredicateList (Can be changed)
- Description: Enables or disables mappings from an entity (subject or object) to its predicates; switching this on can significantly speed up queries that use wildcard predicate patterns.Default value:
false:
- entity-id-size (Cannot be changed)
- Description: Defines the bit size of internal IDs used to index entities (URIs, blank nodes and literals). In most cases, this parameter can be left to its default value. However, if very large datasets containing more than 2 32 entities are used, set this parameter to
40
. Be aware that this can only be set when instantiating a new repository and converting an existing repository between 32 and 40-bit entity widths is not possible.Default value:32
Possible values:32
and40
- imports (Cannot be changed)
- Description: A list of schema files that will be imported at start up. All the statements, found in these files, will be loaded in the repository and will be treated as
read-only
. The serialisation format is determined by the file extension:.brf
=> BinaryRDF.n3
=> N3.nq
=> N-Quads.nt
=> N-Triples.owl
=> RDF/XML.rdf
=> RDF/XML.rdfs
=> RDF/XML.trig
=> TriG.trix
=> TriX.ttl
=> Turtle.xml
=> TriX
Default value:none
Example:owlim:imports "./ont/owl.rdfs;./ont/ex.rdfs"
.Tip
Schema files can be either a local path name, e.g.,
./ontology/myfile.rdf
or a URL, e.g.,http://www.w3.org/2002/07/owl.rdf
. If this parameter is used, the default namespace for each imported schema file must be provided using the defaultNS parameter. - index-compression-ratio (Cannot be changed)
- Description: The compression ratio of paged index files as a percentage of their uncompressed size. The value indicates how much smaller the compressed page should be, so a value of 25 (percent) will attempt to make the index files one quarter of their uncompressed size. Any page that can not be compressed to this size will be stored uncompressed in a separate overlay file.Default value:
-1
Possible value:-1
(off) and the range [10-50]Recommended value:30
- in-memory-literal-properties (Can be changed)
- Description: Turns caching of the literal languages and data-types on and off. If the caching is on and the entity pool is restored from persistence, but there is no such cache available on disk, it is created after the entity pool initialisation.Default value:
false
- predicate-memory (Can be changed)
- Description: Specifies the amount of memory to be used for predicate lists cache.Default value:
32m
query-limit-results
(Can be changed)- Description: Sets the maximum number of results returned from a query after which the evaluation of a query will be terminated; values less than or equal to zero mean no limit.Default value:
0
; (no limit);
- query-timeout (Can be changed)
- Description: Sets the number of seconds after which the evaluation of a query will be terminated; values less than or equal to zero mean no limit.Default value:
0
; (no limit); - read-only (Can be changed)
- Description: In this mode, no modifications are allowed to the data or namespaces.Default value:
false
Possible value:true
, puts the repository in toread-only
mode. throw-QueryEvaluationException-on-timeout
(Can be changed)- Default value:
false
Possible value:true
; if set, aQueryEvaluationException
is thrown when the duration of a query execution exceeds the time-out parameter. - transaction-mode (Can be changed)
- Description: Specifies the transaction mode. In
fast
mode, dirty pages are written to disk in the laziest fashion possible, i.e., pages are only swapped when a new page is requested and there is no more memory available. No guarantees about data security are given when operating in this mode. So, in the event of an abnormal termination, the database must be considered corrupted and will need to be recreated from scratch.Default value:safe
; when set tosafe
, all updates are flushed to disk at the end of each transaction. Commit operations normally take a little longer, but recovery after an abnormal termination is instant. This mode also has much better concurrency characteristics. - transaction-isolation (Can be changed)
- Description: This parameter only has an effect when
transaction-mode=fast
. In fast mode, updates lock the repository preventing concurrent query answering.Default value:true
;Possible value:false
, if set, concurrent queries are permitted with the loss of isolation. - tuple-index-memory (Can be changed)
- Description: Specifies the amount of memory to be used for statement storage cache.Default value:
224m
useShutdownHooks
(Can be changed)- Default value:
true
. If set, the methodOwlimSchemaRepository.shutdown()
is called when the JVM exits (running GraphDB under Tomcat requires this parameter to betrue
, otherwise it cannot be guaranteed that theshutdown()
method will be called at all).
Configuring memory¶
Configuring the memory used by GraphDB is the single most important factor for optimal performance - the more memory available, the better the performance. The available JAVA heap memory is used by:
- the JVM, the application and the GraphDB workspace (byte code, stacks, etc.);
- data structures for storing entities affected by specifying entity-index-size;
- data structures for indexing statements specified using cache-memory.
The following diagram offers a view of the memory use in GraphDB:

The challenge is how to divide up the available memory between the various GraphDB data structures in order to achieve the best overall behaviour.
Configuring the JVM, the application and the GraphDB workspace memory¶
Running in the embedded servlet container¶
Specify the maximum amount of heap space used by a JVM via the -Xmx
virtual machine parameter.
Note
The value should be no higher than the amount of free memory available in the target system multiplied by some factor to allow for extra runtime overhead (e.g., approximately ~90%).
For example, if a system has 16GB total of RAM and 1GB is used by
the operating system, services, etc., ideally, the JVM that hosts the
application using GraphDB should have a maximum heap size of 15GB
(16-1) and can be set using the JVM argument: -Xmx15g
.
Running in a servlet container¶
If the GraphDB repository is hosted by the Sesame HTTP servlet, the maximum heap space applies to the servlet container (Tomcat).
Note
Allow some more heap memory for the runtime overhead, especially if running at the same time as other servlets.
Other options that may improve performance are:
- some configurations of the servlet container, e.g., increasing the permanent generation, which by default is 64MB;
- quadrupling (for Tomcat) with
-XX:MaxPermSize=256m
.
For more information, see the Tomcat documentation.
Configuring the memory for storing entities¶
The memory required for storing entities is determined by the number of entities in the dataset, where the memory required is 4 bytes per slot, allocated by entity-index-size
, plus 12 bytes for each stored entity.
Configuring the Cache memory¶
Apart from the I/O buffers used for caching, GraphDB keeps in memory the
indexes from the nodes in the RDF graph. This is a design decision in
order to improve the overall performance of the repository. Each I/O
buffer (page) is exactly 64kb and the indexing information per node in
the graph is 12 bytes. So, depending on the dataset, memory requirements
per repository may vary. To ease the calculation for the amount of Java
heap memory required for a GraphDB repository, an Excel spreadsheet is
included in the distribution – graphdb-configurator.xls
.
The page cache is organised in two sets of buffers, read-only and dirty.
Each page is first loaded into the read-only
cache. When this gets full,
a page (if dirty) is moved to the dirty cache, where it can be later
written to the storage.
Cache memory distribution¶
There are several components in GraphDB that make use of caching (e.g., predicate list, tuple indices). In different situations, certain caches will need more memory than others. GraphDB allows for the configuration of both the total cache memory to be used by a repository and all the separate per-module caches.
Parameters¶
The following parameters control the amount of memory assigned to each of the different caches:
Parameter | Unit | Default | Description |
cache-memory |
bytes | 256M | The amount of memory to be distributed among different caches. |
tuple-index-memory |
bytes | 224M | Memory used for PSO and POS caches. |
predicate-memory |
bytes | 32M | Memory used for predicate list cache. |
Note
All parameters can be specified in bytes, kilobytes, megabytes or gigabytes by using a unit specifier at the end of the integer number. When no unit specifier is given, this is interpreted as bytes, otherwise use k or K - kilobytes, m or M - megabytes and g or G - gigabytes (everything base 2).
SPO
and PSO
indices are always
used while predicateLists
and the context indices PCSO
/
PSOC
are optional. The memory allocated to these cache types can be calculated automatically
by GraphDB, but some of them can be specified in a more fine-grained
way.The following configuration parameters are relevant:
cache-memory = tuple-index-memory + predicate-memory
If cache-memory is explicitly configured and some of the other memory parameters are omitted, the missing values are resolved by uniformly distributing the remaining memory after all the explicitly configured memory parameters are subtracted.
For example, if cache-memory = 256M
, predicate-memory = 32M
and the other
memory parameter is missing, then it is implicitly assigned (256M -
32M) = 224M
.
If cache-memory is not specified, then all the missing memory parameters are assigned their default values.