Configuring GraphDB

GraphDB 9.x relies on several main directories for configuration, logging, and data.

Directories

GraphDB Home

The GraphDB home defines the root directory where GraphDB stores all of its data. The home can be set through the system or config file property graphdb.home.

The default value for the GraphDB home directory depends on how you run GraphDB:

  • Running as a standalone server: the default is the same as the distribution directory.

  • All other types of installations: OS-dependent directory.

    • On Mac: ~/Library/Application Support/GraphDB.

    • On Windows: Users<username>AppDataRoamingGraphDB.

    • On Linux and other Unixes: ~/.graphdb.

Note

In the unlikely case of running GraphDB on an ancient Windows XP, the default directory is Documents and Settings<username>Application DataGraphDB.

GraphDB does not store any files directly in the home directory, but uses the following subdirectories for data or configuration:

Data directory

The GraphDB data directory defines where GraphDB stores repository data. The data directory can be set through the system or config property graphdb.home.data. The default value is the data subdirectory relative to the GraphDB home directory.

Config directory

The GraphDB config directory defines where GraphDB looks for user-definable configuration. The config directory can be set through the system property graphdb.home.conf.

Note

It is not possible to set the config directory through a config property as the value needs to be set before the config properties are loaded.

The default value is the conf subdirectory relative to the GraphDB home directory.

Work directory

The GraphDB work directory defines where GraphDB stores non-user-definable configuration. The work directory can be set through the system or config property graphdb.home.work. The default value is the work subdirectory relative to the GraphDB home directory.

Logs directory

The GraphDB logs directory defines where GraphDB stores log files. The logs directory can be set through the system or config property graphdb.home.logs. The default value is the logs subdirectory relative to the GraphDB home directory.

Note

When running GraphDB as deployed .war files, the logs directory will be a subdirectory graphdb within the Tomcat’s logs directory.

Important

Even though GraphDB provides the means to specify separate custom directories for data, configuration and so on, it is recommended to specify the home directory only. This ensures that every piece of data, configuration, or logging, is within the specified location.

Step-by-step guide:

  1. Choose a directory for GraphDB home, e.g., /opt/graphdb-instance.

  2. Create the directory /opt/graphdb-instance.

  3. (Optional) Copy the subdirectory conf from the distribution into /opt/graphdb-instance.

  4. Start GraphDB with graphdb -Dgraphdb.home=/opt/graphdb-instance.

GraphDB creates the missing subdirectories data, conf (if you skipped that step), logs, and work.

Checking the configured directories

When GraphDB starts, it logs the actual value for each of the above directories, e.g.:

GraphDB Home directory: /opt/test/graphdb-se-9.x.x
GraphDB Config directory: /opt/test/graphdb-se-9.x.x/conf
GraphDB Data directory: /opt/test/graphdb-se-9.x.x/data
GraphDB Work directory: /opt/test/graphdb-se-9.x.x/work
GraphDB Logs directory: /opt/test/graphdb-se-9.x.x/logs

Configuration

There is a single graphdb.properties config file for GraphDB. It is provided in the distribution under conf/graphdb.properties, where GraphDB loads it from.

This file contains a list of config properties defined in the following format:

propertyName = propertyValue, i.e., using the standard Java properties file syntax.

Each config property can be overridden through a Java system property with the same name, provided in the environment variable GDB_JAVA_OPTS, or in the command line.

Configuration properties

The properties are of four types and are detailed below.

General properties

The general properties define some basic configuration values that are shared with all GraphDB components and types of installation:

Property name

Description

graphdb.home

Defines the GraphDB home directory

graphdb.home.data

Defines the GraphDB data directory

graphdb.home.conf

(only as a system property) Defines the GraphDB conf directory

graphdb.home.work

Defines the GraphDB work directory

graphdb.home.logs

Defines the GraphDB logs directory

graphdb.dist

If graphdb.dist is set and graphdb.home is not, GraphDB will look for the data, conf, logs, etc. directories there (unless they are explicitly set).

graphdb.workbench.home

The place where the source for GraphDB Workbench is located

graphdb.license.file

Sets a custom path to the license file to use

graphdb.page.cache.size

The amount of memory to be taken by the page cache

graphdb.pidfile

The full path to the file where the GraphDB process ID is stored

graphdb.foreground

Tells GraphDB not to close stdout/stderr, but the user can choose whether to daemonize or not

graphdb.heapdump.enable

GraphDB can dump the heap on out-of-memory errors in order to provide insight to the cause for excessive memory usage. This property enables or disables the heap dump. Default is true.

graphdb.heapdump.path

File to write the heap dump to. The default is the heapdump.hprof file in the configured logs directory.
See also the properties graphdb.home and graphdb.home.logs.

URL properties

Hint

Jump ahead to Typical use cases for a list of examples that cover URL properties usage.

In certain cases, GraphDB needs to construct a URL that refers to itself:

  • The repository list in Setup ‣ Repository manager where each repository provides a link that can be used to access the repository via the REST API.

  • Setting up a cluster via Setup ‣ Cluster manager where the system needs the repository URLs to attach the cluster nodes correctly.

  • When a master node instructs a worker node to provide its data for backup.

When GraphDB is accessed directly (without a reverse proxy), it will figure out the correct URLs based on the URL of incoming requests. For example, if GraphDB is accessed using the URL http://graphdb.example.com:7200/, it will construct URLs like http://graphdb.example.com:7200/repositories/repoId.

When GraphDB is accessed via a reverse proxy, the server will not see the actual URL used to access the server and thus it cannot determine a valid external URL on its own. There are two specific setups:

  • The external URL as seen via the proxy uses / as its root, for example, http://rdf.example.com/.

    • GraphDB will map the external / to its own / automatically, no need to add or change any configuration.

    • GraphDB will still not know how to construct external URLs, so setting graphdb.external-url is recommended even though it might appear to work without setting it.

  • The external URL as seen via the proxy uses /something as its root (i.e., something in addition to the /), for example, http://example.com/rdf.

    • GraphDB cannot map this automatically and needs to be configured using the property graphdb.vhosts or graphdb.external-url (see below).

    • This will instruct GraphDB that URLs beginning with http://example.com/rdf/ map to the root path / of the GraphDB server.

The URL properties determine how GraphDB constructs URLs that refer to itself, as well as what URLs are recognized as URLs to access the GraphDB installation. GraphDB will try to auto-detect those values based on URLs used to access it, and the network configuration of the machine running GraphDB. In certain setups involving virtualization or a reverse proxy, it may be necessary to set one or more of the following properties:

Property

Description

graphdb.vhosts

A comma-delimited list of virtual host URLs that can be used to access GraphDB. Setting this property is necessary when GraphDB needs to be accessed behind a reverse proxy and the path of the external URL is different from /, for example http://example.com/rdf.

graphdb.external-url

Sets the canonical external URL. This property implies graphdb.vhosts. If you have provided an explicit value for both graphdb.vhosts and graphdb.external-url, then the URL specified for graphdb.external-url must be one of the URLs in the value for graphdb.vhosts.

When a reverse proxy is in use and most users will access GraphDB through the proxy, it is recommended to set this property instead of, or in addition to graphdb.vhosts, as it will let GraphDB know that the canonical external URL is the one as seen through the proxy.

Tip

Prior to GraphDB 9.8, only the graphdb.external-url property existed. You can keep using it as is.

graphdb.hostname

Overrides the hostname reported by the machine.

Note

For remote locations, the URLs are always constructed using the base URL of the remote location as specified when the location was attached.

Typical use cases
  1. GraphDB is behind a reverse proxy whose URL path is / and most clients will use the proxy URL.

    This setup will appear to work out-of-the box without setting any of the URL properties but it is recommended to set graphdb.external-url. Example URLs:

    • Internal URL: http://graphdb.example.com:7200/

    • External URL used by most clients: http://rdf.example.com/

    The corresponding configuration is:

    # Recommended even though it may appear to work without setting this property
    graphdb.external-url = http://rdf.example.com/
    
  2. GraphDB is behind a reverse proxy whose URL path is /something and most clients will use the proxy URL.

    This configuration requires setting graphdb.external-url (recommended) or graphdb.vhosts to the correct URLs as seen externally through the proxy. Example URLs:

    • Internal URL: http://graphdb.example.com:7200/

    • External URL used by most clients: http://example.com/rdf/

    The corresponding configuration is:

    # Required and recommended
    graphdb.external-url = http://example.com/rdf/
    
    # Non-recommended alternative to the above
    #graphdb.vhosts = http://example.com/rdf/
    
  3. The GraphDB Workbench is used to set up a cluster and is accessed using a localhost URL.

    The system will construct URLs using the hostname of the machine as reported by the machine. This works well with consistent network configurations. When the configuration is inconsistent, for example the hostname is not resolvable from the other machines that need to join the cluster, you may need to set graphdb.hostname to the correct hostname value or avoid using localhost URLs.

    Note that using localhost URLs is recommended only in limited scenarios, such as accessing GraphDB only from the machine where it is running.

  4. Complex network configurations with GraphDB cluster

    Some complex network configurations involve a reverse proxy used to access the master node of the GraphDB cluster but the communication between the cluster nodes does not use the proxy. In such cases, you may need to set more than one of the URL properties to match the specific needs.

    This is also almost always the case with Docker, Kubernetes, or other virtualization or network isolation methods involving setting up a GraphDB cluster.

    Let’s take the following example:

    • Master node is accessible at http://master.example.com:7200/ via a direct connection.

    • Worker node 1 is accessible at http://worker1.example.com:7200/ via a direct connection.

    • Worker node 2 is accessible at http://worker2.example.com:7200/ via a direct connection.

    • Cluster-internal communication needs to use the above addresses.

    • GraphDB users have no direct access to the master URL http://master.example.com:7200/ and instead must use a URL through the reverse proxy, for example http://example.com/graphdb/.

    Matching configuration (on the master node):

    # Configures access through the proxy and lists the internal URL explicitly
    # as we need to use that URL as the value of graphdb.external-url below
    graphdb.vhosts = http://example.com/graphdb/, http://master.example.com:7200/
    
    # Sets the internal URL of the master as the canonical external URL so that
    # cluster management and cluster backup will use the correct URLs
    graphdb.external-url = http://master.example.com:7200/
    

    No extra configuration is needed on the worker nodes.

Network properties

The network properties control how the standalone application listens on a network. These properties correspond to the attributes of the embedded Tomcat Connector. For more information, see Tomcat’s documentation.

Each property is composed of the prefix graphdb.connector. + the relevant Tomcat Connector attribute. The most important property is graphdb.connector.port, which defines the port to be used. The default is 7200.

In addition, the sample config file provides an example for setting up SSL.

Note

The graphdb.connector.<xxx> properties are only relevant when running GraphDB as a standalone application.

Engine properties

You can configure the GraphDB Engine through a set of properties composed of the prefix graphdb.engine. + the relevant engine property. These properties correspond to the properties that can be set when creating a repository through the Workbench or through a .ttl file.

Note

The properties defined in the config override the properties for each repository, regardless of whether you created the repository before or after setting the global value of an engine property. As such, the global override should be used only in specific cases. For normal everyday needs, set the corresponding properties when you create a repository.

Property name

Description

Default value

graphdb.engine.entity-pool-implementation

Defines the Entity Pool implementation for the whole installation. Possible values are transactional or classic.

The default value is transactional. The transactional-simple implementation is not supported anymore.

graphdb.persistent.parallel.inferencers

Since GraphDB 8.6.1, inferencers for our Parallel loader are shut down at the end of each transaction to minimize GraphDB’s memory footprint. For cases where a lot of small insertions are done in a quick succession that can be a problem, as inferencer initialization times can be fairly slow. This setting reverts to the old behavior where inferencers are only shut down when the repository is released.

false

graphdb.engine.entity.validate

A global setting that ensures IRI validation in the entity pool. It is performed only when an IRI is seen for the first time (i.e., when being created in the entity pool). For consistency reasons, not only IRIs coming from RDF serializations, but also all new IRIs (via API or SPARQL), will be validated in the same way. This property can be turned off by setting its value to false.

true

Note

Note that IRI validation makes the import of broken data more problematic - in such a case, you would have to change a config property and restart your GraphDB instance instead of changing the setting per import.

Configuring logging

GraphDB uses logback to configure logging. The default configuration is provided as logback.xml in the GraphDB conf directory.

Jolokia security policy

The GraphDB Jolokia security policy is provided as jolokia-access.xml file in the GraphDB conf directory. Open it to see the default restrictions.

Overriding of the default settings is done as follows:

  • If graphdb.home.conf is not explicitly set, you can configure conf/jolokia-access.xml if necessary.

  • If graphdb.home.conf is explicitly set, but the jolokia-access.xml file is not placed in the respective directory, the default config will load.

  • If graphdb.home.conf is explicitly set, and jolokia-access.xml is placed in the respective directory, this file will load.

See more about the Jolokia security.