Configuring GraphDB Memory

Configure Java heap memory

The following diagram offers a view of the memory use by the GraphDB structures and processes:

_images/total_JAVA_Heap_Memory.png

To specify the maximum amount of heap space used by a JVM, use the -Xmx virtual machine parameter.

As a general rule, the -Xmx value should not exceed 2/3 of the system memory. This means that if you have a system with a total of 8 gigabytes RAM, where 1 gigabyte is used by the operating system, services, etc., and 1 gigabyte by the entity pool and the hash maps. As they are off-heap, the JVM that hosts the application using GraphDB should, ideally, have a maximum heap size of 6 gigabytes, and can be set using the JVM argument -Xmx6g.

Single global page cache

GraphDB’s cache strategy, the single global page cache, employs the concept of one global cache shared between all internal structures of all repositories. This way, you no longer have to configure the cache-memory, tuple-index-memory and predicate-memory, or size every repository and calculate the amount of memory dedicated to it. If at a given moment one of the repositories is being used more, it will naturally get more slots in the cache.

The current global cache implementation can be enabled by specifying: -Dgraphdb.global.page.cache=true -Dgraphdb.page.cache.size=3G.

If you do not specify graphdb.page.cache.size but only enable the global cache, it will take 50% of the -Xmx parameter.

Note

You do not have to change/edit your repository configurations. The new cache will be used when you upgrade to the new version.

Configure Entity pool memory

By default, all entity pool structures are residing off-heap, i.e., outside of the normal JVM heap. This way, you do not have to calculate the entity pool memory when giving the JVM max heap memory parameter to GraphDB. This means, however, that you need to leave some memory outside of the -Xmx.

To activate the old behavior, you can still enable on-heap allocation with -Dgraphdb.epool.onheap=true.

If you are concerned that the process will eat up unlimited amount of memory, you can specify a maximum size with -XX:MaxDirectMemorySize, which defaults to the -Xmx parameter (at least in OpenJDK and Oracle JDK).

Sample memory configuration

This is a sample configuration demonstrating how to correctly size a GraphDB server with a single repository. The loaded dataset is estimated to 500 million RDF statements and 150 million unique entities. As a rule of thumb, the average number of unique entities compared to the total number of statements in a standard dataset is 1:3.

Configuration parameter

Description

Example value

Total OS memory

Total physical system memory

16 GB

On-heap JVM (-Xmx) configuration

Maximum heap memory allocated by the JVM process

10 GB

graphdb.page.cache.size

Global single cache shared between all internal structures of all repositories (the default value is 50% of the heap size)

5 GB

Remaining on-heap memory for query execution

Raw estimate of the memory for query execution; a higher value is required if many, long running analytical queries are expected

~4.5 GB

entity-index-size ( “Entity index size”) stored off-heap by default

Size of the initial entity pool hash table; the recommended value is equal to the total number of unique entities

150,000,000

Memory footprint of the entity pool stored off-heap by default

Calculated from entity-index-size and total number of entities; this memory will be taken after the repository initialization

~2.5 GB

Remaining OS memory

Raw estimate of the memory left to the OS

~3.5 GB

Upper bounds for the memory consumed by the GraphDB process

In order to make sure that no OutOfMemoryExceptions are thrown while working with an active GraphDB repository, you need to set an upper bound value for the memory consumed by all instances of the tupleSet/distinct collections. This is done with the -Ddefault.min.distinct.threshold parameter, whose default value is 250m and can be changed. If this value is surpassed, a QueryEvaluationException is thrown so as to avoid running out of memory due to hungry distinct/group by operation.