importrdf¶
What’s in this document?
The importrdf
tool is used for offline loading of datasets. It supports two sub-commands - Load and Preload.
See more about loading data with ImportRDF here.
Load command line options¶
Usage: importrdf load [option] [file]
Option |
Short version |
Description |
---|---|---|
--config-file <file> |
-c |
Repository-defining |
--force |
-f |
Whether to overwrite the existing repository. |
--help |
-h |
Display this message and exit. |
--repository <repository-name> |
-i |
Name of an existing repository. |
--mode <serial|parallel> |
-m |
Single-threaded (serial) or multi-threaded (parallel) mode for parse/load/infer. |
--partial-load |
-p |
Whether to allow partial load of a file that contains a corrupt line. |
--stop-on-error |
-s |
Whether to stop the process if the dataset contains a corrupt file. |
--verbose |
-v |
Whether to print metrics during load. |
Note
The --partial-load
will load data up to the first corrupt line of the file.
The mode specifies the way the data is loaded in the repository:
serial
: parsing is followed by entity resolution, which is then followed by load, followed by inference, all done in a single thread.parallel
: using multi-threaded parse, entity resolution, load, and inference. This gives a significant boost when loading large datasets with enabled inference.
If no mode is selected, serial
will be used.
Tip
For loading datasets larger than several billion RDF statements, consider using the Preload sub-command.
Preload command line options¶
Usage: importrdf preload [option] [file]
Option |
Short version |
Description |
---|---|---|
--iterator-cache <arg> |
-a |
Chunk iterator cache size. The value will be multiplied by 1,024. Default is |
--chunk <arg> |
-b |
Chunk size for partial sorting of the queues. Use |
--config-file <file> |
-c |
Repository-defining |
--force |
-f |
Whether to overwrite the existing repository. |
--help |
-h |
Display this message and exit. |
--id <repository-id> |
-i |
Existing repository ID. |
--queue-folder <folder> |
-q |
Folder used to store temporary data. |
--recursive |
-r |
Whether to walk folders recursively. |
--parsing-tasks <num> |
-t |
Number of RDF parsers. |
--restart |
-x |
Whether to restart the load, ignoring any existing recovery points. |
--recovery-point-interval <sec> |
-y |
The interval at which recovery points are created. |