Monitoring GraphDB state¶
What’s in this document?
Monitoring the GraphDB internal state and behaviour is very important for identifying issues that need the administrator’s attention. It can be done through the Workbench, the JMX interface or 3rd party tools such as New Relic, OpsWorks, and others. Depending on the cluster topology and setup, you can use different metrics for analysing the state and the performance of the database. We recommend the following ones for monitoring:
- Total transactions - the number of transactions processed by the cluster per minute;
- Workers throughput - the number of transactions processed by the worker per minute;
- Workers load average - load average, as returned by the OS for the worker;
- Workers CPU usage - CPU usage in percent as returned by the OS;
- Masters Load Average - load average, as returned by the OS for the master;
- Heap workers - heap memory usage as reported by the JVM.
Note
All these data can be derived from the JMX console and visualised in whatever tool you would like to use.
Using the Workbench¶
Monitoring the GraphDB internal state and behaviour is very important for identifying issues that need the administrator’s attention. It provides information about the system resources, as well as a way to clean the garbage.
JMX¶
GraphDB uses the JMX interface to provide information about its internal state and behaviour, including trend information, as well as operations for intervening in certain database activities. To configure the jmx endpoint, see Administration tools.
After installing GraphDB EE, GraphDB registers a number of JMX MBeans for every repository, each providing a different set of information and functions for specific features.
Logs¶
Status requests in the enterprise logs on the master¶
Status requests in the enterprise logs on the master are logged only when the log level is DEBUG. They look like this:
[INFO ] 2015-10-14 14:50:16,153 [worker http://localhost:20027/repositories/worker |
c.o.t.r.WorkerThread] Status: <a3d9f184-7c45-4580-986a-edf18dfb54b2> @-1148798447
[...] {SI_fingerprint=-1148798447 [direct=0][expose-entity=0][geospatial=0]
[literals-index=0][lucene=0][notifications=0][plugincontrol=0][rdfpriming=0]
[rdfrank=0][script=0] 0 32, SI_has_StorageSizeOnDisk=17442317,
SI_number_of_explicit_triples=2, SI_has_ActiveCommit=false,
SI_has_FreeDiskSpace=43930714112, SI_sucessful_commits=2, SI_number_of_triples=108,
SI_has_Revision=903827708}
where:
- the worker is on transaction a3d9f184-7c45-4580-986a-edf18dfb54b2 and the fingerprint’s main part is -1148798447;
- the full fingerprint is:
-1148798447 [direct=0][expose-entity=0][geospatial=0][literals-index=0][lucene=0]
[notifications=0][plugincontrol=0][rdfpriming=0][rdfrank=0][script=0] 0 32
- the worker uses revision number 903827708 of the GraphDB engine (which can be important, if you are currently doing a rolling upgrade).
Out of sync exceptions¶
There are different reasons why the workers might go out of sync such as network failure, connectivity problems, etc.
Different fingerprints after an update¶
If the expected fingerprint does not match the one that the non-testing worker returns, the latter is marked as out of sync.
This will be visible in the logs in the following way:
[ERROR] 2015-10-14 14:50:05,867 [worker http://localhost:20043/repositories/worker-test20042 |
c.o.t.r.WorkerThread] Worker out of sync com.ontotext.trree.replicationcluster.OutOfSyncException:
Out of sync (417) after <> @406935413051841 [...]: Fingerprint mismatch; expected
<49a45831-e9f4-4edc-b1b4-4949a8f15adc> @-1992527762 [direct=0][expose-entity=0][geospatial=0]
[literals-index=0][lucene=0][notifications=0][plugincontrol=0][rdfpriming=0][rdfrank=0][script=0]
0 32; executed <49a45831-e9f4-4edc-b1b4-4949a8f15adc> @406933420523974 [direct=0][expose-entity=0]
[geospatial=0][literals-index=0][lucene=0][notifications=0][plugincontrol=0][rdfpriming=0]
[rdfrank=0][script=0] 0 32: <49a45831-e9f4-4edc-b1b4-4949a8f15adc> @406933420523974 [...],
expected: <49a45831-e9f4-4edc-b1b4-4949a8f15adc> @-1992527762 [...]
In this example, you execute a transaction with id 49a45831-e9f4-4edc-b1b4-4949a8f15adc, which has already been tested on one of the workers and you receive the following fingerprint:
-1992527762 [direct=0][expose-entity=0][geospatial=0][literals-index=0][lucene=0][notifications=0]
[plugincontrol=0][rdfpriming=0][rdfrank=0][script=0] 0 32
It is different from the fingerprint you have received for the same transaction id 49a45831-e9f4-4edc-b1b4-4949a8f15adc on this worker (worker-test20042):
406933420523974 [direct=0][expose-entity=0][geospatial=0][literals-index=0][lucene=0][notifications=0]
[plugincontrol=0][rdfpriming=0][rdfrank=0][script=0] 0 32