Requirements

Note

In addition to the technical requirements outlined below, some GraphDB features also require an Enterprise Edition license. Check the Licensing documentation for further information on the different types of GraphDB licenses.

Minimum requirements

The minimum requirements allow loading datasets of only up to 50 million RDF triples.

  • 3GB of memory

  • 8GB of storage space

  • Java SE Development Kit 11 to 16 (not required for GraphDB Free desktop installation)

Warning

All GraphDB indexes are optimized for hard disks with very low seek time. Our team highly recommend using only SSD storage for persisting repository images.

We strongly advise against the use of network file system (NFS). With NFS, updated files are written on the file system in full. This will result in significant update times for repositories with 10M or more statements, especially when processing large transactions. For comparison, block storage file systems need to access only the updated blocks.

Hardware sizing

The best approach for correctly sizing the hardware resources is to estimate the number of explicit statements. Statistically, an average dataset has 3:1 statements to unique RDF resources. The total number of statements determines the expected repository image size, and the number of unique resources affects the memory footprint required to initialize the repository.

The table below summarizes the recommended parameters for planning RAM and disk sizing:

  • Statements are the planned number of explicit + implicit statements.

  • Java heap (minimal) is the minimal recommend JVM heap required to operate the database controlled by -Xmx parameter.

  • Java heap (optimal) is the recommended JVM heap required to operate a database controlled by -Xmx parameter.

  • OS is the recommended minimal RAM reserved for the operating system.

  • Total is the RAM required for the hardware configuration.

  • Repository image is the expected size on disk. For repositories with inference, use the total number of explicit + implicit statements.

Note

Implicit statements are regular statements that are materialized and their number (expansion ratio) can vary depending on the specific data and chosen ruleset.

Statements

Java heap (min)

Java heap (opt)

Off-heap

OS

Total

Repository image

150M

5GB

6GB

1GB

2GB

9GB

17GB

300M

8GB

12GB

2GB

3GB

17GB

34GB

750M

12GB

16GB

3GB

4GB

23GB

72GB

1.5B

32GB

32GB

7GB

4GB

43GB

150GB

3B

50GB

58GB

12GB

4GB

74GB

350GB

7.5B

64GB

68GB

14GB

4GB

86GB

720GB

15B

80GB

88GB

17GB

4GB

109GB

1450GB

30B

128GB

128GB

25GB

6GB

159GB

2900GB

Note

If you are planning on using the external cluster proxy functionality, we recommend you have at least 1GB RAM and a single core. In Kubernetes, 500m for CPU would suffice.

Warning

Running a repository in a cluster doubles the requirements for the repository image storage. The table above provides example sizes for a single repository and does not take restoring backups or snapshot replication in consideration.

Memory management

The optimal approach towards memory management of GraphDB is based on a balance of performance and resource availability per repository. In heavy use cases such as parallel importing into a number of repositories, GraphDB may take up more memory than usual.

There are several configuration properties with which the amount of memory used by GraphDB can be controlled:

  • Reduce the global cache: by default, it can take up to 40% (or up to 40GB in case of heap sizes above 100GB) of the available memory allocated to GraphDB, which during periods of stress can be critical. By reducing the size of the cache, more memory can be freed up for the actual operations. This can be beneficial during periods of prolonged imports as that data is not likely to be queried right away.

    graphdb.page.cache.size=2g

  • Reduce the buffer size: this property is used to control the amount of statements that can be stored in buffers by GraphDB. By default, it is sized at 200,000 statements, which can impact memory usage if many repositories are actively reading/writing data at once. The optimal buffer size depends on the hardware used, as reducing it would cause more write/read operations to the actual storage.

    pool.buffer.size=50000

  • Disable parallel import: during periods of prolonged imports to a large number of repositories, parallel imports can take up more than 800 megabytes of retained heap per repository. In such cases, parallel importing can be disabled, which would force data to be imported serially to each repository. However, serial import reduces performance.

    graphdb.engine.parallel-import=false

This table shows an example of retained heap usage by repository, using different configuration parameters:

Configurations

Retained heap per repository

During prolonged import

Stale

Default

≥800MB

340MB

+ Reduced global cache (2GB)

670MB

140MB

+ Reduced buffer size*

570-620MB

140MB

+ Reduced inference pool size*

370-550MB

140MB

Serial import**

210-280MB

140MB

* Depends on the number of available CPU cores to GraphDB. For the statistics, the default buffer size was reduced from 200,000 (default) to 50,000 statements. The inference pool size was reduced from eight to three. Keep in mind that this reduces performance.

** Without reducing buffer and inference pool sizes. Disables parallel import, which impacts performance.