# importrdf¶

What’s in this document?

The importrdf tool is used for offline loading of datasets. It supports two sub-commands - Load and Preload.

Usage: importrdf load [option] [file]

Option

Short version

Description

--config-file <file>

-c

Repository-defining .ttl file.

--force

-f

Whether to overwrite the existing repository.

--help

-h

Display this message and exit.

--repository <repository-name>

-i

Name of an existing repository.

--mode <serial|parallel>

-m

-p

Whether to allow partial load of a file that contains a corrupt line.

--stop-on-error

-s

Whether to stop the process if the dataset contains a corrupt file.

--verbose

-v

Whether to print metrics during load.

Note

The --partial-load will load data up to the first corrupt line of the file.

The mode specifies the way the data is loaded in the repository:

• serial: parsing is followed by entity resolution, which is then followed by load, followed by inference, all done in a single thread.

• parallel: using multi-threaded parse, entity resolution, load, and inference. This gives a significant boost when loading large datasets with enabled inference.

If no mode is selected, serial will be used.

Tip

Usage: importrdf preload [option] [file]

Option

Short version

Description

--iterator-cache <arg>

-a

Chunk iterator cache size. The value will be multiplied by 1,024. Default is auto, e.g., calculated by the tool.

--chunk <arg>

-b

Chunk size for partial sorting of the queues. Use m for millions or k for thousands. Default is auto, e.g., calculated by the tool.

--config-file <file>

-c

Repository-defining .ttl file.

--force

-f

Whether to overwrite the existing repository.

--help

-h

Display this message and exit.

--id <repository-id>

-i

Existing repository ID.

--queue-folder <folder>

-q

Folder used to store temporary data.

--recursive

-r

Whether to walk folders recursively.

-t

Number of RDF parsers.

--restart

-x

Whether to restart the load, ignoring any existing recovery points.

--recovery-point-interval <sec>

-y

The interval at which recovery points are created.