importrdf

The importrdf tool is used for offline loading of datasets. It supports two sub-commands - Load and Preload.

See more about loading data with ImportRDF here.

Load command line options

Usage: importrdf load [option] [file]

Option

Short version

Description

--config-file <file>

-c

Repository-defining .ttl file.

--force

-f

Whether to overwrite the existing repository.

--help

-h

Display this message and exit.

--repository <repository-name>

-i

Name of an existing repository.

--mode <serial|parallel>

-m

Single-threaded (serial) or multi-threaded (parallel) mode for parse/load/infer.

--partial-load

-p

Whether to allow partial load of a file that contains a corrupt line.

--stop-on-error

-s

Whether to stop the process if the dataset contains a corrupt file.

--verbose

-v

Whether to print metrics during load.

Note

The --partial-load will load data up to the first corrupt line of the file.

The mode specifies the way the data is loaded in the repository:

  • serial: parsing is followed by entity resolution, which is then followed by load, followed by inference, all done in a single thread.

  • parallel: using multi-threaded parse, entity resolution, load, and inference. This gives a significant boost when loading large datasets with enabled inference.

If no mode is selected, serial will be used.

Tip

For loading datasets larger than several billion RDF statements, consider using the Preload sub-command.

Preload command line options

Usage: importrdf preload [option] [file]

Option

Short version

Description

--iterator-cache <arg>

-a

Chunk iterator cache size. The value will be multiplied by 1,024. Default is auto, e.g., calculated by the tool.

--chunk <arg>

-b

Chunk size for partial sorting of the queues. Use m for millions or k for thousands. Default is auto, e.g., calculated by the tool.

--config-file <file>

-c

Repository-defining .ttl file.

--force

-f

Whether to overwrite the existing repository.

--help

-h

Display this message and exit.

--id <repository-id>

-i

Existing repository ID.

--queue-folder <folder>

-q

Folder used to store temporary data.

--recursive

-r

Whether to walk folders recursively.

--parsing-tasks <num>

-t

Number of RDF parsers.

--restart

-x

Whether to restart the load, ignoring any existing recovery points.

--recovery-point-interval <sec>

-y

The interval at which recovery points are created.