Disk space requirements

GraphDB disk space requirements per statement

GraphDB computes inferences when new explicit statements are committed to the repository. The number of inferred statements can be zero, when using the ‘empty’ ruleset, or many multiples of the number of explicit statements (depending on the chosen ruleset and the complexity of the data).

The disk space required for each statement further depends on the size of the URIs and literals. The typical datasets with only the default indices require around 200 bytes, and up to about 300 bytes when all optional indices are turned on.

So, when using the default indices, a good estimate for the amount of disk space you will need is 200 bytes per statement (explicit and inferred), i.e.:

  • 1B triples takes about 90G (without context indexes) - 120G (with context indexes) disk space
  • 1B triples has around 300M unique RDF resources (entities), which consumes 2.8GB RAM
  • 10B triples has around 3B unique RDF resources (entities), which consumes 28GB RAM

GraphDB disk space requirements for loading a dataset

It depends on the reasoning complexity (the number of inferred triples), the length of the URIs, the additional indices used, etc. For example, the following table shows the required disk space in bytes per explicit statement when loading the Wordnet dataset with various GraphDB configurations:

Configuration Bytes per explicit statement
owl2-rl + all optional indices 366
owl2-rl 236
owl-horst + all optional indices 290
owl-horst 196
empty + all optional indices 240
empty 171

When planning for storage capacity based on the input RDF file size, the required disk space depends not only on the GraphDB configuration, but also on the RDF file format used and the complexity of its contents. The following table gives a rough estimate of the expected expansion from an input RDF file to GraphDB storage requirements. E.g., when using OWL2-RL with all optional indices turned on, GraphDB needs about 6.7GB of storage space to load one gigabyte N3 file. With no inference (‘empty’) and no optional indices, GraphDB needs about 0.7GB of storage space to load one gigabyte Trix file. Again, these results were created with the Wordnet dataset:

  N3 N-Triples RDF/XML Trig Trix Turtle
owl2-rl + all optional indices 6.7 2.2 4.8 6.6 1.5 6.7
owl2-rl 4.3 1.4 3.1 4.2 1.0 4.3
owl-horst + all optional indices 5.3 1.7 3.8 5.2 1.2 5.3
owl-horst 3.6 1.2 2.6 3.5 0.8 3.6
empty + all optional indices 4.4 1.4 3.1 4.3 1.0 4.4
empty 3.1 1.0 2.2 3.1 0.7 3.1