Encryption

Encryption in transit

All network traffic between the clients and GraphDB and between the different GraphDB nodes (in case of a cluster topology) can be performed over either HTTP or HTTPS protocols. It is highly advisable to encrypt the traffic with SSL/TLS because it has numerous security benefits.

Enable SSL/TLS

As GraphDB runs on embedded Tomcat server the security configuration is standard with a few exceptions. You can find the official Tomcat documentation on how to enable SSL/TLS. Additional information on how to configure your GraphDB instance to use SSL/TLS could be found in the Configuration part of this document.

HTTPS in the Cluster

As there is a lot of traffic between the cluster nodes it is important that it is encrypted. In order to do so a few requirements should be met.

  1. SSL/TLS should be enabled on all cluster nodes.
  2. The nodes’ certificates should be trusted by the other nodes in the cluster.
  3. The URLs of the remote location (configured in Setup -> Repositories -> Attach Remote Location) should be using the HTTPS scheme.

The method of enabling SSL/TLS is already described in the upper section. There are no differences when setting up the node to be used as a cluster one. In order to achieve the certificate trust between the nodes you have a few options.

Use certificates signed by a trusted Certification Authority

This way you will not need any additional configuration and the clients will not get security warning when connecting to the clients. The drawback is that these certificates are usually not free and you need to work with a third-party CA. We will not look at this option in more detail as creating such certificate is highly dependant on the CA.

Use Self-Signed certificates

The benefit is that you generate these certificates yourself and there is no need for somebody to sign them. However, the drawback is that by default the nodes will not trust the other nodes’ certificates.

If you generate a separate self-signed certificate for each node in the cluster this certificate would have to be present in the Java Truststores of all other nodes in the cluster. You could do this by either adding the certificate to the default Java Truststore or specify an additional Truststore when running GraphDB. Information on how to generate a certificate, add it to a Truststore and make the JVM use this Truststore can be found in the official Java documentation.

However, this method introduces a lot of configuration overhead. Therefore, it is recommended that, instead of separate certificates for each node, you generate a single self-signed certificate and use it on all Cluster nodes. GraphDB extends the standard Java TrustManager so it will automatically trust its own certificate. This means that if all nodes in the cluster are using a shared certificate there would be no need to add it to the Truststore.

Another difference with the standard Java TrustManager is that GraphDB has the option to disregard the hostname when validating the certificates. If this option is disabled it is recommended to add all possible IPs and DNS names of all nodes which which will be using the certificate as Subject Alternative Names when generating the certificate (wildcards can be used as well).

Both options to trust your own certificate and to skip the hostname validation are configurable from the graphdb.properties file:

  • graphdb.http.client.ssl.ignore.hostname - false by default
  • graphdb.http.client.ssl.trust.own.certificate - true by default

Encryption at rest

GraphDB does not provide encryption for its data. All indexes and entities are stored in binary format on the hard-drive. It should be noted that the data from them can be easily extracted in case somebody gains access to the data directory.

This is why it is recommended to implement some kind of disk encryption on your GraphDB server. There are multiple third-party solutions that can be used.

GraphDB has been tested on LUKS encrypted hard-drive and noticeable performance impact hasn’t been observed. However, please keep in mind that such may be present and is highly dependant on your specific use case.