Requirements¶
What’s in this document?
Minimum requirements¶
The minimum requirements allow loading datasets of only up to 50 million RDF triples.
2 gigabytes of memory
2 gigabytes of disk space
Java SE Development Kit 11 to 16 (not required for GraphDB Free desktop installation)
Warning
All GraphDB indexes are optimized for hard disks with very low seek time. Our team highly recommend using only SSD partition for persisting repository images.
Hardware sizing¶
The best approach for correctly sizing the hardware resources is to estimate the number of explicit statements. Statistically, an average dataset has 3:1 statements to unique RDF resources. The total number of statements determines the expected repository image size, and the number of unique resources affects the memory footprint required to initialize the repository.
The table below summarizes the recommended parameters for planning RAM and disk sizing:
Statements are the planned number of explicit statements.
Unique resources are the expected number of unique RDF resources (IRIs, blank nodes, literals, RDF-star [formerly RDF*] embedded triples).
Java heap (minimal) is the minimal recommend JVM heap required to operate the database controlled by
-Xmx
parameter.Java heap (optimal) is the recommended JVM heap required to operate a database controlled by
-Xmx
parameter.Off heap is the database memory footprint (outside of the JVM heap) required to initialize the database.
OS is the recommended minimal space reserved for the operating system.
Total is the RAM required for the hardware configuration.
Repository image is the expected size on disk. For repositories with inference use the total number of explicit + implicit statements.
Statements |
Unique resources |
Java heap (min) |
Java heap (opt) |
Off heap |
OS |
Total |
Repository image |
---|---|---|---|---|---|---|---|
100M |
33.3M |
1.2GB |
3.6GB |
370M |
2 |
6GB |
12GB |
200M |
66.6M |
2.4GB |
7.2GB |
740M |
3 |
11GB |
24GB |
500M |
166.5M |
6GB |
18GB |
1.86GB |
4 |
24GB |
60GB |
1B |
333M |
12GB |
30GB |
3.72GB |
4 |
38GB |
120GB |
2B |
666M |
24GB |
30GB |
7.44GB |
4 |
42GB |
240GB |
5B |
1.665B |
30GB |
30GB |
18.61GB |
4 |
53GB |
600GB |
10B |
3.330B |
30GB |
30GB |
37.22GB |
4 |
72GB |
1200GB |
20B |
6.660B |
30GB |
30GB |
74.43GB |
4 |
109GB |
2400GB |
Memory management¶
The optimal approach towards memory management of GraphDB is based on a balance of performance and resource availability per repository. In heavy use cases such as parallel importing into a number of repositories, GraphDB may take up more memory than usual.
There are several configuration properties with which the amount of memory used by GraphDB can be controlled:
Reduce the global cache: by default, it can take up to half of the available memory allocated to GraphDB, which during periods of stress can be critical. By reducing the size of the cache, more memory can be freed up for the actual operations. This can be beneficial during periods of prolonged imports as that data is not likely to be queried right away.
graphdb.page.cache.size=2g
Reduce the buffer size: this property is used to control the amount of statements that can be stored in buffers by GraphDB. By default, it is sized at 200,000 statements, which can impact memory usage if many repositories are actively reading/writing data at once. The optimal buffer size depends on the hardware used, as reducing it would cause more write/read operations to the actual storage.
pool.buffer.size=50000
Reduce inference pool size: this property controls the number of available inference workers that are used for forward inferencing during import to repositories. Reducing the number of inference workers can have a limited benefit, depending on the number of available CPU cores as per the GraphDB license. For the table below, we have used an 8-core license and reduced the number of workers per repository down to three.
infer.pool.size=3
Disable parallel import: during periods of prolonged imports to a large number of repositories, parallel imports can take up more than 800 megabytes of retained heap per repository. In such cases, parallel importing can be disabled, which would force data to be imported serially to each repository. However, serial import reduces performance.
graphdb.engine.parallel-import=false
This table shows an example of retained heap usage by repository, using different configuration parameters:
Configurations |
Retained heap per repository |
|
---|---|---|
During prolonged import |
Stale |
|
Default |
≥800MB |
340MB |
+ Reduced global cache (2GB) |
670MB |
140MB |
+ Reduced buffer size* |
570-620MB |
140MB |
+ Reduced inference pool size* |
370-550MB |
140MB |
Serial import** |
210-280MB |
140MB |
* Depends on the number of available CPU cores to GraphDB. For the statistics, the default buffer size was reduced from 200,000 (default) to 50,000 statements. The inference pool size was reduced from eight to three. Keep in mind that this reduces performance.
** Without reducing buffer and inference pool sizes. Disables parallel import, which impacts performance.
Licensing¶
GraphDB EE is available under an RDBMS-like commercial license on a per-server-CPU basis. It is neither free nor open source. To purchase a license or obtain a copy for evaluation, please contact graphdb-info@ontotext.com.