storage-tool

What’s in this document?

The storage-tool is an application for scanning and repairing a GraphDB repository.

To run it, execute the bin/storage-tool script in the GraphDB distribution folder.

For help, run:

bin/storage-tool -–help

Note

The tool works only on repository images that are not in use (i.e., when the database is down).

Supported commands

Command

Description

scan

Scans repository index(es) and prints statistics for the number of statements and repo consistency.

rebuild

Uses the source index (src-index) to rebuild the destination index dest-index. If src-index = dest-index, compacts dest-index. If src-index is missing and dest-index = predicates. then it just rebuilds dest-index.

replace

Replaces an existing entity origin-uri with a non-existing one repl-uri.

repair

Repairs the repository indexes and restores data, a better variant of the merge index.

export

Uses the source index (src-index) to export repository data to the destination file (dest-file). Supported destination file extension formats: .trig, .ttl, .nq.

epool

Scans the entity pool for inconsistencies and checks for invalid IRIs. IRIs are validated against the RFC 3987 standard. Invalid IRIs will be listed in an entities.invalid.log file for review. If -fix is specified, instead of listing the invalid IRIs, they will instead be fixed in the entity pool.

--help

Prints command-specific help messages.

Options

Parameter

Short version

Description

Default value

--storage

-s

(required) Absolute path to repo storage directory

null

--help

-h

Prints out help messages

--src-index

-r

Predicate collection to be used as source. Can be one of pso, pos.

null

--dest-index

-d

Predicate collection to be used as destination. Can be one of pso, pos, cpso, predicates.

null

--origin-uri

-o

Original existing URI in the repository to be replaced

null

--repl-uri

-n

New non-existing URI in the repository to replace the original

null

--dest-file

-f

Path to file used to store exported data. Supported formats: .trig, .ttl, .nq.

null

--fix

-x

Lists or fixes ePool problems.

false

--check-pred-statistics

-c

Runs additional check of predicates statistics

false

--status-print-interval

-i

Interval between status message printing (in seconds)

30, means 30 seconds

--page-cache-size

-p

Size of the page cache (in thousands).

10, means 10,000 elements

--positive-filter-status

-v

Optional statement status filter during export

-1, means no filter

--sort-buffer-size

-b

Size of the external sort buffer

100, means 100 million elements, max value is also 100

Examples

  • scan the repository, print statement statistics and repository consistency status:

    bin/storage-tool scan --storage /<path-to-repo>/storage
    
    • when everything is OK

        Scan result consistency check!
    
    _______________________scan results_______________________
    mask |              pso |              pos | diff | flags
    0001 |       29,937,266 |       29,937,266 |   OK | INF
    0002 |       61,251,058 |       61,251,058 |   OK | EXP
    0005 |              145 |              145 |   OK | INF RO
    0006 |            8,134 |            8,134 |   OK | EXP RO
    0009 |        1,661,585 |        1,661,585 |   OK | INF HID
    000a |        2,834,694 |        2,834,694 |   OK | EXP HID
    0011 |        1,601,875 |        1,601,875 |   OK | INF EQ
    0012 |        1,934,013 |        1,934,013 |   OK | EXP EQ
    0020 |              309 |              221 |   OK | DEL
    0021 |               15 |               23 |   OK | INF DEL
    0022 |               34 |               30 |   OK | EXP DEL
    _______________________additional checks_______________________
         |              pso |              pos | stat | check-type
         |         59b30d4d |         59b30d4d |   OK | checksum
         |                0 |                0 |   OK | not existing ids
         |                0 |                0 |   OK | literals as subjects
         |                0 |                0 |   OK | literals as predicates
         |                0 |                0 |   OK | literals as contexts
         |                0 |                0 |   OK | blanks as predicates
         |             true |             true |   OK | page consistency
         |         80b9ad24 |         80b9ad24 |   OK | cpso crc
         |                - |                - |   OK | epool duplicate ids
         |                - |                - |   OK | epool consistency
         |                - |                - |   OK | literal index consistency
         |                - |                - |   OK | triple entity index consistency
    
    Scan determines that this repo image is consistent.
    
    • when there are broken indexes

        _______________________scan results_______________________
    mask |              pso |              pos | diff | flags
    0001 |       29,284,580 |       29,284,580 |   OK | INF
    0002 |       63,559,252 |       63,559,252 |   OK | EXP
    0004 |            8,134 |            8,134 |   OK | RO
    0005 |            1,140 |            1,140 |   OK | INF RO
    0009 |        1,617,004 |        1,617,004 |   OK | INF HID
    000a |        3,068,289 |        3,068,289 |   OK | EXP HID
    0011 |        1,599,375 |        1,599,375 |   OK | INF EQ
    0012 |        2,167,536 |        2,167,536 |   OK | EXP EQ
    0020 |              327 |              254 |   OK | DEL
    0021 |               11 |               12 |   OK | INF DEL
    0022 |               31 |               24 |   OK | EXP DEL
    004a |               17 |               17 |   OK | EXP HID MRK
    
    _______________________additional checks_______________________
        |              pso |              pos | stat | check-type
        | ffffffff93e6a372 | ffffffff93e6a372 |   OK | checksum
        |                0 |                0 |   OK | not existing ids
        |                0 |                0 |   OK | literals as subjects
        |                0 |                0 |   OK | literals as predicates
        |                0 |                0 |   OK | literals as contexts
        |                0 |                0 |   OK | blanks as predicates
        |             true |             true |   OK | page consistency
        |         bf55ab00 |         bf55ab00 |   OK | cpso crc
        |                - |                - |   OK | epool duplicate ids
        |                - |                - |   OK | epool consistency
        |                - |                - |  ERR | literal index consistency
    
    Scan determines that this repo image is INCONSISTENT.
    

    The literals index contains more statements than the literals in epool, and you have to rebuild it:

  • scan the PSO index and print a status message every 60 seconds:

    bin/storage-tool scan --storage /<path-to-repo>/storage --src-index=pso --status-print-interval=60
    
  • compact the PSO index (self-rebuild equals compacting):

    bin/storage-tool rebuild --storage /<path-to-repo>/storage --src-index=pso --dest-index=pso
    
  • rebuild the POS index from the PSO index and compact POS:

    bin/storage-tool rebuild --storage /<path-to-repo>/storage --src-index=pso --dest-index=pos
    
  • rebuild the predicates statistics index:

    bin/storage-tool rebuild --storage /<path-to-repo>/storage --dest-index=predicates
    
  • replace http://onto.com#e1 with http://onto.com#e2:

    bin/storage-tool replace --storage /<path-to-repo>/storage --origin-uri="<http://onto.com#e1>" --repl-uri="<http://onto.com#e2>"
    
  • dump the repository data using the POS index into a f.trig file:

    bin/storage-tool export --storage /<path-to-repo>/storage --src-index=pos --dest-file=/repo/storage/f.trig
    
  • scan the entity pool and create a report with invalid IRIs, if such exist:

    bin/storage-tool epool --storage /<path-to-repo>/storage