Monitoring GraphDB State

Monitoring the GraphDB internal state and behavior is very important for identifying issues that need administrator attention. It can be done through the Workbench, the JMX interface, or third party tools such as New Relic, OpsWorks, among others. Depending on the cluster topology and setup, you can use different metrics for analyzing the state and the performance of the database. We recommend the following ones for monitoring:

  • Total transactions - the number of transactions processed by the cluster per minute;
  • Workers throughput - the number of transactions processed by the worker per minute;
  • Workers load average - load average, as returned by the OS for the worker;
  • Workers CPU usage - CPU usage in percent as returned by the OS;
  • Masters Load Average - load average, as returned by the OS for the master;
  • Heap workers - heap memory usage as reported by the JVM.

Note

All these data can be derived from the JMX console and visualized in whatever tool you would like to use.

Using the Workbench

Monitoring the GraphDB internal state and behavior is essential for providing information about the system resources, as well as a way to clean the garbage.

JMX

GraphDB uses the JMX interface to provide information about its internal state and behavior, including trend information, as well as operations for intervening in certain database activities. To configure the jmx endpoint, see Administration tools.

After installing GraphDB EE, GraphDB registers a number of JMX MBeans for every repository, each providing a different set of information and functions for specific features.

Logs

Status requests in the enterprise logs on the master

Status requests in the enterprise logs on the master are logged only when the log level is DEBUG. They look like this:

[INFO ] 2015-10-14 14:50:16,153 [worker http://localhost:20027/repositories/worker |
  c.o.t.r.WorkerThread] Status: <a3d9f184-7c45-4580-986a-edf18dfb54b2> @-1148798447
  [...] {SI_fingerprint=-1148798447 [direct=0][expose-entity=0][geospatial=0]
  [literals-index=0][lucene=0][notifications=0][plugincontrol=0][rdfpriming=0]
  [rdfrank=0][script=0] 0 32, SI_has_StorageSizeOnDisk=17442317,
  SI_number_of_explicit_triples=2, SI_has_ActiveCommit=false,
  SI_has_FreeDiskSpace=43930714112, SI_sucessful_commits=2, SI_number_of_triples=108,
  SI_has_Revision=903827708}

where:

  • the worker is on transaction a3d9f184-7c45-4580-986a-edf18dfb54b2 and the fingerprint’s main part is -1148798447;
  • the full fingerprint is:
-1148798447 [direct=0][expose-entity=0][geospatial=0][literals-index=0][lucene=0]
  [notifications=0][plugincontrol=0][rdfpriming=0][rdfrank=0][script=0] 0 32
  • the worker uses revision number 903827708 of the GraphDB engine (which can be important, if you are currently doing a rolling upgrade).

Out of sync exceptions

There are different reasons why the workers might go out of sync such as network failure, connectivity problems, etc.

Different fingerprints after an update

If the expected fingerprint does not match the one that the non-testing worker returns, the latter is marked as out of sync.

This will be visible in the logs in the following way:

[ERROR] 2015-10-14 14:50:05,867 [worker http://localhost:20043/repositories/worker-test20042 |
  c.o.t.r.WorkerThread] Worker out of sync com.ontotext.trree.replicationcluster.OutOfSyncException:
  Out of sync (417) after <> @406935413051841 [...]: Fingerprint mismatch; expected
  <49a45831-e9f4-4edc-b1b4-4949a8f15adc> @-1992527762 [direct=0][expose-entity=0][geospatial=0]
  [literals-index=0][lucene=0][notifications=0][plugincontrol=0][rdfpriming=0][rdfrank=0][script=0]
  0 32; executed <49a45831-e9f4-4edc-b1b4-4949a8f15adc> @406933420523974 [direct=0][expose-entity=0]
  [geospatial=0][literals-index=0][lucene=0][notifications=0][plugincontrol=0][rdfpriming=0]
  [rdfrank=0][script=0] 0 32: <49a45831-e9f4-4edc-b1b4-4949a8f15adc> @406933420523974 [...],
  expected: <49a45831-e9f4-4edc-b1b4-4949a8f15adc> @-1992527762 [...]

In this example, you execute a transaction with id 49a45831-e9f4-4edc-b1b4-4949a8f15adc, which has already been tested on one of the workers and you receive the following fingerprint:

-1992527762 [direct=0][expose-entity=0][geospatial=0][literals-index=0][lucene=0][notifications=0]
  [plugincontrol=0][rdfpriming=0][rdfrank=0][script=0] 0 32

It is different from the fingerprint you have received for the same transaction id 49a45831-e9f4-4edc-b1b4-4949a8f15adc on this worker (worker-test20042):

406933420523974 [direct=0][expose-entity=0][geospatial=0][literals-index=0][lucene=0][notifications=0]
  [plugincontrol=0][rdfpriming=0][rdfrank=0][script=0] 0 32