Storage tool

What’s in this document?

The Storage Tool is an application for scanning and repairing a GraphDB repository. To run it, execute bin/storage-tool in the GraphDB distribution folder. For help, run bin/storage-tool –help.

Note

The tool works only on repository images that are not in use (i.e., when the database is down).

Options

Parameter

Description

Default value

command

Operation to be executed, mandatory

storage

Absolute path to repo storage directory, mandatory

esize

Size of entity pool IDs: 32 or 40 bits

32

statusPrintInterval

Interval between status message printing

30, means 30 seconds

pageCacheSize

Size of the page cache

10, means 10,000 elements

sortBufferSize

Size of the external sort buffer

95, means 95M elements, max value is also 95

positiveFilterStatus

Optional statement status filter during export

-1, means no filter

srcIndex

One of pso, pos

destIndex

One of pso, pos, cpso

origURI

Original existing URI in the repo

replURI

New non-existing URI in the repo

destFile

Path to file used to store exported data

Supported commands

Command

Description

scan

Scans repo index(es) and prints statistic of the number of statements and repo consistency.

rebuild

Uses the source index (srcIndex) to rebuild the destination index destIndex. If srcIndex = destIndex, compacts destIndex. If srcIndex is missing and destIndex = predicates. then it just rebuilds destIndex.

replace

Replaces an existing entity -origURI with a non-existing one -replURI.

repair

Repairs the repository indexes and restores data, a better variant of the merge index.

mergeindex

Merges pso and pos indexes (makes union), rebuilds context indexes if any. Note that there is no data backup.

export

Uses the source index (srcIndex) to export repository data to the destination file (destFile). Supported destination file extension formats: .trig, .ttl, .nq.

epool

Scans the entity pool for inconsistencies and checks for invalid IRIs. IRIs are validated against the RFC 3987 standard. Invalid IRIs will be listed in an entities.invalid.log file for review. If -fix is specified, instead of listing the invalid IRIs, they will instead be fixed in the entity pool.

Examples

  • scan the repository, print statement statistics and repository consistency status:

    bin/storage-tool -command=scan -storage=/repo/storage
    
    • when everything is OK

        Scan result consistency check!
    
    _______________________scan results_______________________
    mask |              pso |              pos | diff | flags
    0001 |       29,937,266 |       29,937,266 |   OK | INF
    0002 |       61,251,058 |       61,251,058 |   OK | EXP
    0005 |              145 |              145 |   OK | INF RO
    0006 |            8,134 |            8,134 |   OK | EXP RO
    0009 |        1,661,585 |        1,661,585 |   OK | INF HID
    000a |        2,834,694 |        2,834,694 |   OK | EXP HID
    0011 |        1,601,875 |        1,601,875 |   OK | INF EQ
    0012 |        1,934,013 |        1,934,013 |   OK | EXP EQ
    0020 |              309 |              221 |   OK | DEL
    0021 |               15 |               23 |   OK | INF DEL
    0022 |               34 |               30 |   OK | EXP DEL
    _______________________additional checks_______________________
         |              pso |              pos | stat | check-type
         |         59b30d4d |         59b30d4d |   OK | checksum
         |                0 |                0 |   OK | not existing ids
         |                0 |                0 |   OK | literals as subjects
         |                0 |                0 |   OK | literals as predicates
         |                0 |                0 |   OK | literals as contexts
         |                0 |                0 |   OK | blanks as predicates
         |             true |             true |   OK | page consistency
         |         80b9ad24 |         80b9ad24 |   OK | cpso crc
         |                - |                - |   OK | epool duplicate ids
         |                - |                - |   OK | epool consistency
         |                - |                - |   OK | literal index consistency
         |                - |                - |   OK | triple entity index consistency
    
    Scan determines that this repo image is consistent.
    
    • when there are broken indexes

        _______________________scan results_______________________
    mask |              pso |              pos | diff | flags
    0001 |       29,284,580 |       29,284,580 |   OK | INF
    0002 |       63,559,252 |       63,559,252 |   OK | EXP
    0004 |            8,134 |            8,134 |   OK | RO
    0005 |            1,140 |            1,140 |   OK | INF RO
    0009 |        1,617,004 |        1,617,004 |   OK | INF HID
    000a |        3,068,289 |        3,068,289 |   OK | EXP HID
    0011 |        1,599,375 |        1,599,375 |   OK | INF EQ
    0012 |        2,167,536 |        2,167,536 |   OK | EXP EQ
    0020 |              327 |              254 |   OK | DEL
    0021 |               11 |               12 |   OK | INF DEL
    0022 |               31 |               24 |   OK | EXP DEL
    004a |               17 |               17 |   OK | EXP HID MRK
    
    _______________________additional checks_______________________
        |              pso |              pos | stat | check-type
        | ffffffff93e6a372 | ffffffff93e6a372 |   OK | checksum
        |                0 |                0 |   OK | not existing ids
        |                0 |                0 |   OK | literals as subjects
        |                0 |                0 |   OK | literals as predicates
        |                0 |                0 |   OK | literals as contexts
        |                0 |                0 |   OK | blanks as predicates
        |             true |             true |   OK | page consistency
        |         bf55ab00 |         bf55ab00 |   OK | cpso crc
        |                - |                - |   OK | epool duplicate ids
        |                - |                - |   OK | epool consistency
        |                - |                - |  ERR | literal index consistency
    
    Scan determines that this repo image is INCONSISTENT.
    

    Literals index contains more statements than the literals in epool, and you have to rebuild it:

  • scan the PSO index of a 40bit repository, print a status message every 60 seconds:

    bin/storage-tool -command=scan -storage=/repo/storage -srcIndex=pso -esize=40 -statusPrintInterval=60
    
  • compact the PSO index (self-rebuild equals compacting):

    bin/storage-tool -command=rebuild -storage=/repo/storage -esize=40 -srcIndex=pso -destIndex=pso
    
  • rebuild the POS index from the PSO index and compact POS:

    bin/storage-tool -command=rebuild -storage=/repo/storage -esize=40 -srcIndex=pso -destIndex=pos
    
  • rebuild the predicates statistics index:

    bin/storage-tool -command=rebuild -storage=/repo/storage -esize=40 -destIndex=predicates
    
  • replace http://onto.com#e1 with http://onto.com#e2:

    bin/storage-tool -command=replace -storage=/repo/storage -origURI="<http://onto.com#e1>" -replURI="<http://onto.com#e2>"
    
  • dump the repository data using the POS index into a f.trig file:

    bin/storage-tool -command=export -storage=/repo/storage -srcIndex=pos -destFile=/repo/storage/f.trig
    
  • scan the entity pool and create report with invalid IRIs, if such exist:

    bin/storage-tool -command=epool -storage=/repo/storage -esize=40