Storage tool

What’s in this document?

The Storage Tool is an application for scanning and repairing a GraphDB repository. To run the Storage Tool, please execute bin/storage-tool in the GraphDB distribution folder. For help run ./storage-tool –help.

Note

The tool works only on repository images that are not in use (i.e., when the database is down).

Options

-command=<operation to be executed, MANDATORY>
-storage=<absolute path to repo storage dir, MANDATORY>
-esize=<size of entity pool IDs: 32 or 40 bits, DEFAULT 32>
-statusPrintInterval=<size of the external sort buffer, DEFAULT 95, means 95M elements, max value is also 95>
-pageCacheSize=<size of the page cache, DEFAULT 10, means 10K elements>
-sortBufferSize=<size of the external sort buffer, DEFAULT 100, means 100M elements>
-srcIndex=<one of pso, pos, pcso, pcos>
-destIndex=<one of pso, pos, pcso, pcos, predicates>
-origURI=<original existing URI in the repo>
-replURI=<new non-existing URI in the repo>
-destFile=<path to file used to store exported data>

Supported commands

  • scan - scans the repository index(es) and prints statistics about the number of statements and repository consistency;
  • rebuild - uses the source index srcIndex to rebuild the destination index destIndex. If srcIndex = destIndex, compacts destIndex. If srcIndex is missing and destIndex = predicates, just rebuilds destIndex.
  • replace - replaces an existing entity -origURI with a non-existing one -replURI;
  • repair - repairs the repository indexes and restores data, a better variant of the merge index;
  • export - uses the source index (srcIndex) to export repository data to the destination file destFile. Supported destination file extensions formats: .trig .ttl .nq

Examples

  • scan the repository, print statement statistics and repository consistency status:

    -command=scan -storage=/repo/storage
    
    • when everything is OK
    __________________________________________scan results__________________________________________
    mask |              pso |              pos |             pcso |             pcos | diff | flags
    0001 |               19 |               19 |               19 |               19 |   OK | INF
    0002 |               25 |               25 |               25 |               25 |   OK | EXP
    0005 |              102 |              102 |              102 |              102 |   OK | INF RO
    
    __________________________________________additional checks__________________________________________
         |              pso |              pos |             pcso |             pcos | stat | check-type
         |             2e9d |             2e9d |             2e9d |             2e9d |   OK | checksum
         |                0 |                0 |                0 |                0 |   OK | literals as subjects
         |                0 |                0 |                0 |                0 |   OK | literals as predicates
         |                0 |                0 |                0 |                0 |   OK | literals as contexts
         |                0 |                0 |                0 |                0 |   OK | blanks as predicates
         |             true |             true |             true |             true |   OK | page consistency
         |                - |                - |                - |                - |   OK | epool consistency
    
    Scan determines that this repo image is consistent!
    
    • when there are broken indexes
    __________________________________________scan results__________________________________________
    mask |              pso |              pos |             pcso |             pcos | diff | flags
    0001 |      310,512,696 |      310,512,696 |      310,512,697 |      310,512,696 |  ERR | INF
    0002 |      183,244,533 |      183,244,533 |      183,244,534 |      183,244,533 |  ERR | EXP
    0005 |              102 |              102 |              102 |              102 |   OK | INF RO
    0020 |              235 |              215 |               19 |                0 |   OK | DEL
    0021 |              687 |              821 |                0 |                0 |   OK | INF DEL
    0022 |              911 |              975 |                0 |                0 |   OK | EXP DEL
    
    __________________________________________additional checks__________________________________________
         |              pso |              pos |             pcso |             pcos | stat | check-type
         | ffffffffce1a908d | ffffffffce1a908d | ffffffffda22fb99 | ffffffffce1a908d |  ERR | checksum
         |                0 |                0 |                0 |                0 |   OK | literals as subjects
         |                0 |                0 |                0 |                0 |   OK | literals as predicates
         |                0 |                0 |                0 |                0 |   OK | literals as contexts
         |                0 |                0 |                0 |                0 |   OK | blanks as predicates
         |             true |             true |             true |             true |   OK | page consistency
         |                - |                - |                - |                - |   OK | epool consistency
    
    Scan determines that this repo image is INCONSISTENT
    

    pcso contains more statements then the other indexes, we have the following options:

    • rebuild pcso from one of the other indexes
    • rebuild all other indexes from pcso, because it has one statement more and we do not want to lose it
  • scan the PSO index of a 40bit repository, print a status message every 60 seconds:

    -command=scan -storage=/repo/storage -srcIndex=pso -esize=40 -statusPrintInterval=60
    
  • compact the PSO index (self-rebuild equals compacting):

    -command=rebuild -storage=/repo/storage -esize=40 -srcIndex=pso -destIndex=pso
    
  • rebuild the POS index from the PSO index and compact POS:

    -command=rebuild -storage=/repo/storage -esize=40 -srcIndex=pso -destIndex=pos
    
  • rebuild the predicates statistics index:

    -command=rebuild -storage=/repo/storage -esize=40 -destIndex=predicates
    
  • replace http://onto.com#e1 with http://onto.com#e2:

    -command=replace -storage=/repo/storage -origURI=<http://onto.com#e1>
      -replURI=<http://onto.com#e2>
    
  • dump the repository data using the POS index into a f.trig file:

    -command=export -storage=/repo/storage -srcIndex=pos -destFile=/repo/storage/f.trig