GraphDB command line tools

The GraphDB distribution includes a number of command line tools located in the bin directory. Their file extensions are .sh or empty for Linux/Unix, and .cmd for Windows.

console

This is an interactive console based on the RDF4J console. After starting it up, entering help at its prompt gives you further information about operations that you can perform at that prompt.

Usage: console [OPTION] [repositoryID].

Use --help to see the available command line options, which are:

Option

Description

-c,--cautious

Always answer no to (suppressed) confirmation prompts

-d,--dataDir <arg>

Sesame data directory to ‘connect’ to

-e,--echo

Echoes input back to stdout, useful for logging script sessions

-f,--force

Always answer yes to (suppressed) confirmation prompts

-h,--help

Print this help

-q,--quiet

Suppresses prompts, useful for scripting

-s,--serverURL <arg>

URL of Sesame server to connect to, e.g., http://localhost/openrdf-sesame/

-v,--version

Print version information

-x,--exitOnError

Immediately exit the console on the first error

generate-report

This tool is used to generate a zip with report about a GraphDB server. On startup, graphdb -p specifies a PID file to which to write the process ID, which is needed by this tool.

Usage: <graphdb-pid> [<output-file>].

The available options are:

Option

Description

<graphdb-pid>

(Required) The process ID of a running GraphDB instance.

<output-file>

(Optional) The path of the file where the report should be saved. If this option is missing, the report will be saved in a file called graphdb-server-report.zip in the current directory.

graphdb

The graphdb command line tool starts the database. It supports the following options:

Option

Description

-d

daemonize (run in background), not available on Windows

-s

run in server-only mode (no Workbench UI)

-p pidfile

write PID to pidfile

-h

--help

print command line options

-v

print GraphDB version, then exit

-Dprop

set Java system property

-Xprop

set non-standard Java system property

Note

Run graphdb -s to start GraphDB in server-only mode without the web interface (no Workbench). A remote Workbench can still be attached to the instance.

importrdf

The importrdf tool is used for offline loading of datasets. It supports two sub-commands — Load and Preload.

See more about using this utility in Loading data using the ImportRDF tool.

Load command line options

Usage: importrdf load [option] [file]

Option

Short version

Description

--config-file <file>

-c

Repository-defining .ttl file.

--force

-f

Whether to overwrite the existing repository.

--help

-h

Display this message and exit.

--repository <repository-name>

-i

Name of an existing repository.

--mode <serial|parallel>

-m

Single-threaded (serial) or multi-threaded (parallel) mode for parse/load/infer.

--partial-load

-p

Whether to allow partial load of a file that contains a corrupt line.

--stop-on-error

-s

Whether to stop the process if the dataset contains a corrupt file.

--verbose

-v

Whether to print metrics during load.

Note

The --partial-load will load data up to the first corrupt line of the file.

The mode specifies the way the data is loaded in the repository:

  • serial: parsing is followed by entity resolution, which is then followed by load, followed by inference, all done in a single thread.

  • parallel: using multi-threaded parse, entity resolution, load, and inference. This gives a significant boost when loading large datasets with enabled inference.

If no mode is selected, serial will be used.

Tip

For loading datasets larger than several billion RDF statements, consider using the Preload sub-command.

Preload command line options

Usage: importrdf preload [option] [file]

Option

Short version

Description

--iterator-cache <arg>

-a

Chunk iterator cache size. The value will be multiplied by 1,024. Default is auto, e.g., calculated by the tool.

--chunk <arg>

-b

Chunk size for partial sorting of the queues. Use m for millions or k for thousands. Default is auto, e.g., calculated by the tool.

--config-file <file>

-c

Repository-defining .ttl file.

--force

-f

Whether to overwrite the existing repository.

--help

-h

Display this message and exit.

--id <repository-id>

-i

Existing repository ID.

--queue-folder <folder>

-q

Folder used to store temporary data.

--recursive

-r

Whether to walk folders recursively.

--parsing-tasks <num>

-t

Number of RDF parsers.

--restart

-x

Whether to restart the load, ignoring any existing recovery points.

--recovery-point-interval <sec>

-y

The interval at which recovery points are created.

rdfvalidator

Used for validating RDF files.

Usage: rdfvalidator <input-folder-or-file-with-rdf-files>.

reification-convert

This tool converts standard RDF reification to RDF-star. The output file must be an RDF-star format.

Usage: reification-convert [--relaxed] <input-file1> [<input-file2> ...] <output-file>.

Available options:

Option

Description

--relaxed

Enables relaxed mode where x a rdf:Statement is not required.

rule-compiler

Usage: rule-compiler <rules.pie> <java-class-name> <output-class-file> [<partial>].

The .pie file format is described more in Custom rulesets.

Available options:

Option

Description

<rules.pie>

The name of the rule .pie file

<java-class-name>

The name of the Java class

<output-class-file>

The output file name

[<partial>]

(Optional)

storage-tool

What’s in this document?

The storage-tool is an application for scanning and repairing a GraphDB repository.

To run it, execute the bin/storage-tool script in the GraphDB distribution folder.

For help, run:

bin/storage-tool --help

Note

The tool works only on repository images that are not in use (i.e., when the database is down).

Supported commands

Command

Description

scan

Scans repository index(es) and prints statistics for the number of statements and repo consistency.

rebuild

Uses the source index (src-index) to rebuild the destination index dest-index. If src-index = dest-index, compacts dest-index. If src-index is missing and dest-index = predicates. then it just rebuilds dest-index.

replace

Replaces an existing entity origin-uri with a non-existing one repl-uri.

repair

Repairs the repository indexes and restores data, a better variant of the merge index.

export

Uses the source index (src-index) to export repository data to the destination file (dest-file). Supported destination file extension formats: .trig, .ttl, .nq.

epool

Scans the entity pool for inconsistencies and checks for invalid IRIs. IRIs are validated against the RFC 3987 standard. Invalid IRIs will be listed in an entities.invalid.log file for review. If -fix is specified, instead of listing the invalid IRIs, they will instead be fixed in the entity pool.

--help

Prints command-specific help messages.

Options

Parameter

Short version

Description

Default value

--storage

-s

(required) Absolute path to repo storage directory

null

--help

-h

Prints out help messages

--src-index

-r

Predicate collection to be used as source. Can be one of pso, pos.

null

--dest-index

-d

Predicate collection to be used as destination. Can be one of pso, pos, cpso, predicates.

null

--origin-uri

-o

Original existing URI in the repository to be replaced

null

--repl-uri

-n

New non-existing URI in the repository to replace the original

null

--dest-file

-f

Path to file used to store exported data. Supported formats: .trig, .ttl, .nq.

null

--fix

-x

Lists or fixes ePool problems.

false

--check-pred-statistics

-c

Runs additional check of predicates statistics

false

--status-print-interval

-i

Interval between status message printing (in seconds)

30, means 30 seconds

--page-cache-size

-p

Size of the page cache (in thousands).

10, means 10,000 elements

--positive-filter-status

-v

Optional statement status filter during export

-1, means no filter

--sort-buffer-size

-b

Size of the external sort buffer

100, means 100 million elements, max value is also 100

Examples

  • scan the repository, print statement statistics and repository consistency status:

    bin/storage-tool scan --storage /<path-to-repo>/storage
    
    • when everything is OK

        Scan result consistency check!
    
    _______________________scan results_______________________
    mask |              pso |              pos | diff | flags
    0001 |       29,937,266 |       29,937,266 |   OK | INF
    0002 |       61,251,058 |       61,251,058 |   OK | EXP
    0005 |              145 |              145 |   OK | INF RO
    0006 |            8,134 |            8,134 |   OK | EXP RO
    0009 |        1,661,585 |        1,661,585 |   OK | INF HID
    000a |        2,834,694 |        2,834,694 |   OK | EXP HID
    0011 |        1,601,875 |        1,601,875 |   OK | INF EQ
    0012 |        1,934,013 |        1,934,013 |   OK | EXP EQ
    0020 |              309 |              221 |   OK | DEL
    0021 |               15 |               23 |   OK | INF DEL
    0022 |               34 |               30 |   OK | EXP DEL
    _______________________additional checks_______________________
         |              pso |              pos | stat | check-type
         |         59b30d4d |         59b30d4d |   OK | checksum
         |                0 |                0 |   OK | not existing ids
         |                0 |                0 |   OK | literals as subjects
         |                0 |                0 |   OK | literals as predicates
         |                0 |                0 |   OK | literals as contexts
         |                0 |                0 |   OK | blanks as predicates
         |             true |             true |   OK | page consistency
         |         80b9ad24 |         80b9ad24 |   OK | cpso crc
         |                - |                - |   OK | epool duplicate ids
         |                - |                - |   OK | epool consistency
         |                - |                - |   OK | literal index consistency
         |                - |                - |   OK | triple entity index consistency
    
    Scan determines that this repo image is consistent.
    
    • when there are broken indexes

        _______________________scan results_______________________
    mask |              pso |              pos | diff | flags
    0001 |       29,284,580 |       29,284,580 |   OK | INF
    0002 |       63,559,252 |       63,559,252 |   OK | EXP
    0004 |            8,134 |            8,134 |   OK | RO
    0005 |            1,140 |            1,140 |   OK | INF RO
    0009 |        1,617,004 |        1,617,004 |   OK | INF HID
    000a |        3,068,289 |        3,068,289 |   OK | EXP HID
    0011 |        1,599,375 |        1,599,375 |   OK | INF EQ
    0012 |        2,167,536 |        2,167,536 |   OK | EXP EQ
    0020 |              327 |              254 |   OK | DEL
    0021 |               11 |               12 |   OK | INF DEL
    0022 |               31 |               24 |   OK | EXP DEL
    004a |               17 |               17 |   OK | EXP HID MRK
    
    _______________________additional checks_______________________
        |              pso |              pos | stat | check-type
        | ffffffff93e6a372 | ffffffff93e6a372 |   OK | checksum
        |                0 |                0 |   OK | not existing ids
        |                0 |                0 |   OK | literals as subjects
        |                0 |                0 |   OK | literals as predicates
        |                0 |                0 |   OK | literals as contexts
        |                0 |                0 |   OK | blanks as predicates
        |             true |             true |   OK | page consistency
        |         bf55ab00 |         bf55ab00 |   OK | cpso crc
        |                - |                - |   OK | epool duplicate ids
        |                - |                - |   OK | epool consistency
        |                - |                - |  ERR | literal index consistency
    
    Scan determines that this repo image is INCONSISTENT.
    

    The literals index contains more statements than the literals in epool, and you have to rebuild it:

  • scan the PSO index and print a status message every 60 seconds:

    bin/storage-tool scan --storage /<path-to-repo>/storage --src-index=pso --status-print-interval=60
    
  • compact the PSO index (self-rebuild equals compacting):

    bin/storage-tool rebuild --storage /<path-to-repo>/storage --src-index=pso --dest-index=pso
    
  • rebuild the POS index from the PSO index and compact POS:

    bin/storage-tool rebuild --storage /<path-to-repo>/storage --src-index=pso --dest-index=pos
    
  • rebuild the predicates statistics index:

    bin/storage-tool rebuild --storage /<path-to-repo>/storage --dest-index=predicates
    
  • replace http://onto.com#e1 with http://onto.com#e2:

    bin/storage-tool replace --storage /<path-to-repo>/storage --origin-uri="<http://onto.com#e1>" --repl-uri="<http://onto.com#e2>"
    
  • dump the repository data using the POS index into a f.trig file:

    bin/storage-tool export --storage /<path-to-repo>/storage --src-index=pos --dest-file=/repo/storage/f.trig
    
  • scan the entity pool and create a report with invalid IRIs, if such exist:

    bin/storage-tool epool --storage /<path-to-repo>/storage