storage-tool¶
What’s in this document?
The storage-tool
is an application for scanning and repairing a GraphDB repository.
To run it, execute the bin/storage-tool
script in the GraphDB distribution folder.
For help, run:
bin/storage-tool -–help
Note
The tool works only on repository images that are not in use (i.e., when the database is down).
Supported commands¶
Command |
Description |
---|---|
scan |
Scans repository index(es) and prints statistics for the number of statements and repo consistency. |
rebuild |
Uses the source index ( |
replace |
Replaces an existing entity |
repair |
Repairs the repository indexes and restores data, a better variant of the merge index. |
export |
Uses the source index ( |
epool |
Scans the entity pool for inconsistencies and checks for invalid IRIs. IRIs are validated against the
RFC 3987 standard. Invalid IRIs will be listed in an
|
--help |
Prints command-specific help messages. |
Options¶
Parameter |
Short version |
Description |
Default value |
---|---|---|---|
--storage |
-s |
(required) Absolute path to repo storage directory |
null |
--help |
-h |
Prints out help messages |
|
--src-index |
-r |
Predicate collection to be used as source. Can be one of pso, pos. |
null |
--dest-index |
-d |
Predicate collection to be used as destination. Can be one of pso, pos, cpso, predicates. |
null |
--origin-uri |
-o |
Original existing URI in the repository to be replaced |
null |
--repl-uri |
-n |
New non-existing URI in the repository to replace the original |
null |
--dest-file |
-f |
Path to file used to store exported data. Supported formats: |
null |
--fix |
-x |
Lists or fixes ePool problems. |
|
--check-pred-statistics |
-c |
Runs additional check of predicates statistics |
|
--status-print-interval |
-i |
Interval between status message printing (in seconds) |
|
--page-cache-size |
-p |
Size of the page cache (in thousands). |
|
--positive-filter-status |
-v |
Optional statement status filter during export |
|
--sort-buffer-size |
-b |
Size of the external sort buffer |
|
Examples¶
scan the repository, print statement statistics and repository consistency status:
bin/storage-tool scan --storage /<path-to-repo>/storage
when everything is OK
Scan result consistency check! _______________________scan results_______________________ mask | pso | pos | diff | flags 0001 | 29,937,266 | 29,937,266 | OK | INF 0002 | 61,251,058 | 61,251,058 | OK | EXP 0005 | 145 | 145 | OK | INF RO 0006 | 8,134 | 8,134 | OK | EXP RO 0009 | 1,661,585 | 1,661,585 | OK | INF HID 000a | 2,834,694 | 2,834,694 | OK | EXP HID 0011 | 1,601,875 | 1,601,875 | OK | INF EQ 0012 | 1,934,013 | 1,934,013 | OK | EXP EQ 0020 | 309 | 221 | OK | DEL 0021 | 15 | 23 | OK | INF DEL 0022 | 34 | 30 | OK | EXP DEL _______________________additional checks_______________________ | pso | pos | stat | check-type | 59b30d4d | 59b30d4d | OK | checksum | 0 | 0 | OK | not existing ids | 0 | 0 | OK | literals as subjects | 0 | 0 | OK | literals as predicates | 0 | 0 | OK | literals as contexts | 0 | 0 | OK | blanks as predicates | true | true | OK | page consistency | 80b9ad24 | 80b9ad24 | OK | cpso crc | - | - | OK | epool duplicate ids | - | - | OK | epool consistency | - | - | OK | literal index consistency | - | - | OK | triple entity index consistency Scan determines that this repo image is consistent.
when there are broken indexes
_______________________scan results_______________________ mask | pso | pos | diff | flags 0001 | 29,284,580 | 29,284,580 | OK | INF 0002 | 63,559,252 | 63,559,252 | OK | EXP 0004 | 8,134 | 8,134 | OK | RO 0005 | 1,140 | 1,140 | OK | INF RO 0009 | 1,617,004 | 1,617,004 | OK | INF HID 000a | 3,068,289 | 3,068,289 | OK | EXP HID 0011 | 1,599,375 | 1,599,375 | OK | INF EQ 0012 | 2,167,536 | 2,167,536 | OK | EXP EQ 0020 | 327 | 254 | OK | DEL 0021 | 11 | 12 | OK | INF DEL 0022 | 31 | 24 | OK | EXP DEL 004a | 17 | 17 | OK | EXP HID MRK _______________________additional checks_______________________ | pso | pos | stat | check-type | ffffffff93e6a372 | ffffffff93e6a372 | OK | checksum | 0 | 0 | OK | not existing ids | 0 | 0 | OK | literals as subjects | 0 | 0 | OK | literals as predicates | 0 | 0 | OK | literals as contexts | 0 | 0 | OK | blanks as predicates | true | true | OK | page consistency | bf55ab00 | bf55ab00 | OK | cpso crc | - | - | OK | epool duplicate ids | - | - | OK | epool consistency | - | - | ERR | literal index consistency Scan determines that this repo image is INCONSISTENT.
The literals index contains more statements than the literals in epool, and you have to rebuild it:
scan the
PSO
index and print a status message every 60 seconds:bin/storage-tool scan --storage /<path-to-repo>/storage --src-index=pso --status-print-interval=60
compact the
PSO
index (self-rebuild equals compacting):bin/storage-tool rebuild --storage /<path-to-repo>/storage --src-index=pso --dest-index=pso
rebuild the
POS
index from thePSO
index and compactPOS
:bin/storage-tool rebuild --storage /<path-to-repo>/storage --src-index=pso --dest-index=pos
rebuild the predicates statistics index:
bin/storage-tool rebuild --storage /<path-to-repo>/storage --dest-index=predicates
replace
http://onto.com#e1
withhttp://onto.com#e2
:bin/storage-tool replace --storage /<path-to-repo>/storage --origin-uri="<http://onto.com#e1>" --repl-uri="<http://onto.com#e2>"
dump the repository data using the
POS
index into af.trig
file:bin/storage-tool export --storage /<path-to-repo>/storage --src-index=pos --dest-file=/repo/storage/f.trig
scan the entity pool and create a report with invalid IRIs, if such exist:
bin/storage-tool epool --storage /<path-to-repo>/storage