System monitoring¶
What’s in this document?
GraphDB offers several options for system monitoring described in detail below.
Workbench monitoring¶
In the respective tabs under
in the GraphDB Workbench, you can monitor the most important hardware information as well as other application-related metrics:Resource monitoring: system CPU load, file descriptors, heap memory usage, off-heap memory usage, and disk storage.
Performance (per repository): queries, global page cache, entity pool, and transactions and connections.
Cluster health (in case a cluster exists).

Prometheus monitoring¶
The GraphDB REST API exposes several monitoring endpoints suitable for scraping by Prometheus. They return a suitable data format when the request has an Accept
header of the type text/plain
, which is the default type for Prometheus scrapers.
GraphDB structures monitoring API¶
The /rest/monitor/structures
endpoint enables you to monitor GraphDB structures – the global page cache and the entity pool. This provides a better understanding of whether the current GraphDB configuration is optimal for your specific use case (e.g., repository size, query complexity, etc.)
The current state of the global page cache and the entity pool are returned via the following metrics:
Parameter |
Description |
---|---|
|
GraphDB’s global page cache hits count. Along with the global page cache miss count, this metric can be used to diagnose a small or oversized global page cache.
|
|
GraphDB’s global page cache miss count. |
Infrastructure statistics monitoring API¶
The /rest/monitor/infrastructure
endpoint enables you to monitor GraphDB’s infrastructure so as to have better visibility of the hardware resources usage. It returns the most important hardware information and several application-related metrics:
Parameter |
Description |
---|---|
|
Count of currently open file descriptors. This helps diagnose slow-downs of the system or a slow storage if the number remains high for a longer period of time. |
|
Shows the current CPU load for the entire system in %. |
|
Maximum available memory for the GraphDB instance. Returns |
|
Initial amount of memory (controlled by |
|
Current committed memory in bytes. |
|
Current used memory in bytes. Along with the rest of the memory-related properties, this can be used to detect memory issues. |
|
Count of full garbage collections from the starting of the GraphDB instance. This metric is useful for detecting memory usage issues and system “freezes”. |
|
Off-heap initial memory in bytes. |
|
Maximum direct memory. Returns |
|
Current off-heap committed memory in bytes. |
|
Current off-heap used memory in bytes. |
|
Used storage space on the partition where the data directory sits, in bytes. This is useful for detecting a soon-out-of-hard-disk-space issue along with the free storage metric. |
|
Free storage space on the partition where the data directory sits, in bytes. |
|
Used storage space on the partition where the logs directory sits, in bytes. This is useful for detecting a soon-out-of-hard-disk-space issue along with the free storage metric. |
|
Free storage space on the partition where the logs directory sits, in bytes. |
|
Used storage space on the partition where the work directory sits, in bytes. This is useful for detecting a soon-out-of-hard-disk-space issue along with the free storage metric. |
|
Free storage space on the partition where the work directory sits, in bytes. |
|
Current used threads count. |
Cluster statistics monitoring API¶
Via the /rest/monitor/cluster
endpoint, you can monitor GraphDB’s cluster statistics in order to diagnose problems and cluster slow-downs more easily. The endpoint returns several cluster-related metrics, and will not return anything if a cluster is not created.
Parameter |
Description |
---|---|
|
Count of leader elections from cluster creation. If there are a lot of leader elections, this might mean an unstable cluster setup with nodes that are not always properly operating. |
|
Count of total failure recoveries in the cluster from cluster creation. Includes failed and successful recoveries. If there are a lot of recoveries, this indicates issues with the cluster stability. |
|
Count of failed transactions in the cluster. |
|
Total nodes count in the cluster. |
|
Count of nodes that are currently in-sync. If a lower number than the total nodes count is reported, this means that there are nodes that are either out-of-sync, disconnected, or syncing. |
|
Count of nodes that are out-of-sync. If there are such nodes for a longer period of time, this might indicate a failure in one or more nodes. |
|
Count of nodes that are disconnected. If there are such nodes for a longer period of time, this might indicate a failure in one or more nodes. |
|
Count of nodes that are currently syncing. If there are such nodes for a longer period of time, this might indicate a failure in one or more nodes. |
Query statistics monitoring API¶
Via the /rest/monitor/repository/{repositoryID}
endpoint, you can monitor GraphDB’s query and transaction statistics in order to obtain a better understanding of the slow queries, suboptimal queries, active transactions, and open connections. This information helps in identifying possible issues more easily.
The endpoint exists for each repository, and a scrape configuration must be created for each repository that you want to monitor. Normally, repositories are not created or deleted frequently, so the Prometheus scrape configurations should not be changed often either.
Important
In order for GraphDB to be able to return these metrics, the repository must be initialized.
The following metrics are exposed:
Parameter |
Description |
---|---|
|
Count of slow queries executed on the repository. The counter is reset when a GraphDB instance is restarted. If the count of slow queries is high, this might indicate a setup issue, unoptimized queries, or not good enough hardware. |
|
Count of queries that the GraphDB engine was not able to evaluate and were sent for evaluation to the RDF4J engine. A too high number might indicate that the queries typically used on the repository are not optimal. |
|
Count of currently active transactions. |
|
Count of currently open connections. If this number stays high for a longer period of time, it might indicate an issue with connections not being closed once their job is done. |
|
GraphDB’s entity pool reads count. Along with the entity pool writes count, this metric can be used to diagnose a small or oversized entity pool. |
|
GraphDB’s entity pool writes count. |
|
Current entity pool size, i.e., entity count in the entity pool. |
Prometheus setup¶
To scrape the mentioned endpoints in Prometheus, we need to add scraper configurations. Below is an example configuration for three of the endpoints, assuming we have a repository called “wines”.
- job_name: graphdb_queries_monitor
metrics_path: /rest/monitor/repository/wines
scrape_interval: 5s
static_configs:
- targets: [ 'my-graphdb-hostname:7200’ ]
- job_name: graphdb_hw_monitor
metrics_path: /rest/monitor/infrastructure
scrape_interval: 5s
static_configs:
- targets: [ 'my-graphdb-hostname:7200’ ]
- job_name: graphdb_structures_monitor
metrics_path: /rest/monitor/structures
scrape_interval: 5s
static_configs:
- targets: [ 'my-graphdb-hostname:7200’ ]
Cluster monitoring¶
When configuring Prometheus to monitor a GraphDB cluster, the setup is similar with a few differences.
In order to get the information for each cluster node, each node’s address must be included in the targets list.
The other difference is that another scraper must be configured to monitor the cluster status. This scraper can be configured in several ways:
Scrape only the external proxy (which will always point to the current cluster leader) if it exists in the current cluster configuration.
The downside of this method is that if for some reason, there is a connectivity problem between the external proxy and the nodes, it will not report any metrics.
Scrape the external proxy and all cluster nodes.
This method will enable you to receive metrics from all cluster nodes including the external proxy. This way, you can see the cluster status even if the external proxy has issues connecting to the nodes. The downside is that most of the time, each cluster will be duplicated for each cluster node.
Scrape all cluster nodes (if there is no external proxy).
If there is no external proxy in the cluster setup, the only option is to monitor all nodes in order to determine the status of the entire cluster. If you choose only one node and it is down for some reason, you would not receive any cluster-related metrics.
The scraper configuration is similar to the previous ones, with the only difference that the targets array might contain one or more cluster nodes (and/or external proxies). For example, if you have a cluster with two external proxies and five cluster nodes, the scraper might be configured to scrape only the two proxies like so:
- job_name: graphdb_cluster_monitor
metrics_path: /rest/monitor/cluster
scrape_interval: 5s
static_configs:
- targets: [ 'graphdb-proxy-0:7200’, 'graphdb-proxy-1:7200’ ]
As mentioned, you can also include some or all of the cluster nodes if you want.
JMX console monitoring¶
The database employs a number of metrics that help tune the memory parameters and performance. They can be found in the JMX console under
the com.ontotext.metrics
package. The global metrics that are shared between the repositories are under the top level package, and those specific to repositories - under com.ontotext.metrics.<repository-id>
.

Page cache metrics¶
The global page cache provides metrics that help tune the amount of memory given for the page cache. It contains the following elements:
Parameter |
Description |
---|---|
|
Counter for the pages that are evicted out of the page and the amount of time it takes for them to be flushed on the disc. |
|
Number of hits in the cache. This can be viewed as the number of pages that do not need to be read from the disc but can be taken from the cache. |
|
Counter for the pages that have to be read from the disc. The smaller the number of pages is, the better. |
|
Number of cache misses. The smaller this number is, the better. If you see that the number of hits is smaller than the misses, then it is probably a good idea to increase the page cache memory. |
Entity pool metrics¶
You can monitor the number of reads and writes in the entity pool of each repository with the following parameters:
Parameter |
Description |
---|---|
|
Counter for the number of reads in the entity pool. |
|
Counter for the number of writes in the entity pool. |