How to Install GraphDB in AWS¶
What’s in this document?
GraphDB can be deployed on Amazon Web Services by following the general installation instructions. You can find information regarding the costs of running a GraphDB instance on the AWS Services website.
This documentation will walk you through the process of setting up the necessary environment for deploying GraphDB on AWS.
Note
Ontotext maintains a Terraform module that automates this entire procedure. Learn more about how to use it at our GitHub repository.
Architecture¶
The GraphDB architecture diagram showcases the deployment architecture for GraphDB on EC2 instances in AWS cloud platform. The diagram illustrates the key components, and their interactions to provide a high-level understanding of the system’s architecture and how it should be deployed.

Note
There are no third-party integration points on the default GraphDB deployment.
Prerequisites¶
There are several prerequisites for running a GraphDB instance on AWS:
Access to an AWS account (we recommend the use of an Identity and Access Management user for the deployment instead of a root user account)
Active GraphDB license required to use the Enterprise functionalities of the database
Create a shell script used to initialize the EC2 instance
Note
The GraphDB Terraform module contains a Terraform template you can use when creating your shell script. If you use the Terraform template, you will need to replace the placeholder values of all variables with your actual values.
Technical requirements¶
The following AWS services are required to complete the GraphDB deployment on AWS:
Service |
Description |
---|---|
Virtual Private Cloud (VPC) |
Allows for the creation of a private network in AWS. |
Elastic Compute Cloud (EC2) |
A server instance that Elastic Kubernetes Service will be using as a managed node or a Server instance that will be used for hosting the database application. |
Network Load Balancer (NLB) |
For load balancing the GraphDB cluster nodes |
Elastic Block Store (EBS) |
EBS volumes will be used for storing the data |
AWS Identity and Access Management (IAM) |
Provides user and access management for your GraphDB deployment |
AWS Systems Manager |
Various GraphDB configurations are saved in the Parameter Store |
Simple Storage Service (S3) |
S3 buckets will be used for storing the backups |
Required skills¶
Note
Deploying GraphDB on AWS EC2 requires a combination of skills in AWS infrastructure management, database administration, and system troubleshooting. Acquiring these skills may involve hands-on experience, self-study, online resources, and formal training programs provided by AWS or other educational platforms.
The following skills and knowledge are typically required in order to successfully deploy GraphDB on AWS EC2:
AWS Fundamentals |
Familiarity with Amazon Web Services (AWS) and understanding of its core concepts, such as EC2 instances, security groups, VPCs and IAM roles. Knowledge of how to navigate the AWS Management Console and interact with AWS services is essential. |
---|---|
EC2 Instance Management |
Proficiency in creating and managing EC2 instances. This includes selecting the appropriate instance type, configuring security settings, managing storage (EBS volumes), and understanding EC2 instance lifecycle management. |
Networking and Security |
Understanding of networking concepts in AWS, including VPC (Virtual Private Cloud) configuration, subnets, routing tables, and security groups. Knowledge of how to set up inbound and outbound traffic rules to allow communication with GraphDB. |
Linux Administration |
Proficiency in Linux command-line interface (CLI) and basic administration tasks. This includes SSH access to EC2 instances, navigating the file system, managing permissions, installing packages, and configuring system settings. |
Database Management |
Knowledge of GraphDB and its deployment requirements. Understanding of how to configure GraphDB settings, including database storage, memory allocation, and repository creation. |
Database Backup and Recovery |
Familiarity with backup and recovery strategies for GraphDB on AWS. Knowledge of AWS services like Amazon S3 for data backups and restoration processes. |
Monitoring and Troubleshooting |
Proficiency in monitoring the health and performance of GraphDB instances on AWS. Understanding of logging, monitoring and troubleshooting techniques using AWS CloudWatch, EC2 instance logs, and GraphDB diagnostic tools. |
High Availability and Scalability |
Knowledge of implementing high availability and scalability for GraphDB on AWS. This may involve using features like EC2 Auto Scaling, load balancers, and multi-Availability Zone (AZ) deployments. |
Infrastructure as Code (IaC) |
Familiarity with Infrastructure as Code principles and tools like AWS CloudFormation or Terraform. This enables automating the provisioning and configuration of GraphDB infrastructure on AWS. |
Security Best Practices |
Understanding of security best practices for AWS deployments, including data encryption, access controls, identity and access management, and compliance considerations. |
Setting up your Virtual Private Cloud (VPC)¶
Go to the
and click onSelect the
option. This will allow you to configure and create all other networking components such as subnets, gateways, and moreEnter the following VPC configurations:
Enter a descriptive name by which to recognize your VPC
3
1 per AZ (this will enable external Internet access)
Check both the
and checkboxesClick on
Once you’ve completed this process, you will see various status messages as the system creates the subnets, NAT gateways, and other components of the VPC. This may take several minutes; you will know that it is finished when all the status messages have turned green and the
button appears at the bottom.Setting up your Route 53 private hosted zone¶
The GraphDB Raft implementation requires static addresses. This is achieved by creating a private hosted zone in Route 53 and registering the instances there.
Go to the Route 53 dashboard and select
from the navigation menu on the leftClick
Enter a domain name, such as
graphdb.cluster
Choose
Under the
select your region and the VPC that you createdClick
Hint
You may want to write down the
, as you will need it later.Note
Later, you will also need to create “A” records for the instances.
Creating an S3 bucket¶
Tip
This step is optional, but recommended.
GraphDB can store backups to S3 and, if needed, restore from them. To create an S3 bucket:
Go to the S3 console and click on
Enter a name for the bucket that is globally unique among all S3 buckets
Select your region, scroll down and click
Once you’ve completed this process, you should then see a “Successfully created” message. We recommend you to block all public access, so that it isn’t accessible by anyone else.
Importing a certificate into Amazon Certificate Manager (ACM)¶
Tip
This step is optional.
While serving GraphDB requests over a secured and encrypted connection is not strictly required, it is highly recommended. This section goes over the process of importing a certificate into Amazon Certificate Managed (ACM) which could be used in the next section while creating the load balancer.
Go to the
console and select from the navigation menu on the left, or click on at the topPaste the
Paste the
Warning
You need to remove the password for the key before pasting it
You can optionally paste the certificate chain, then click on
You can optionally add tags, then click on
againClick on
Open the imported certificate and note the Amazon Resource Name (ARN) for it - you will need it when creating the load balancer lister.
Setting up the Load Balancer¶
Go to the
and select from the navigation menu on the leftIn the top-right, make sure you are in the region for which you want to create a Load Balancer
Click on
Click on
and enter the following configurations:Optionally, you can create a TLS listener and remove the TCP listener on port 80:
Remove the
on port 80 (leaving it will result in unencrypted traffic)Click on
and choose the following configurations:: select from the previous step
:443
: select the target group from the previous step
:
Leave the security policy to the recommended one
Select the certificate that you imported in ACM and leave the ALPN policy set to
None
Click on
under Listener and routingEnter the following configurations:
leave as instances
Enter a descriptive name by which to recognize your group
TCP
7200
/rest/cluster/node/status
Under
, override the port to be7201
Go back to the load balancer creation page and select the target group that was created in the previous step
Return to the
page and refresh the list of Target GroupsSelect the one you just created, and click on
Setting up instance role and profile permissions¶
You have to grant GraphDB EC2 instances certain permissions so that they can perform several tasks in AWS. This section describes what permissions they need, what they are used for, and how to create them. To do this, we will create an instance profile and then attach it to the instances.

To create a policy:
Go to the
(IAM) dashboard and select from the navigation menu on the leftClick on
and go to theReplace the JSON script with the one for the permission you are creating
Click on
Enter a
in the of the screenClick on
Warning
You need to create a different policy for each of the JSON scripts listed below.
Allow the EC2 instance to read, write, and list objects in S3 (Optional)¶
If you are planning to store GraphDB backups to S3, the EC2 instance needs to be able to read, write, and list objects in S3. To do this, paste the following JSON in the appropriate field when creating a policy, changing graphdb-backup-bucket
to the name of your S3 bucket:
{
"Version":"2012-10-17",
"Statement":[
{
"Effect":"Allow",
"Action":[
"s3:ListBucket",
"s3:*Object",
"s3:GetAccelerateConfiguration",
"s3:ListMultipartUploadParts",
"s3:AbortMultipartUpload"
],
"Resource":[
"arn:aws:s3:::graphdb-backup-bucket",
"arn:aws:s3:::graphdb-backup-bucket/*"
]
}
]
}
Allow the listing of EC2 instances¶
The user data script that we configure later on will need permissions to list the EC2 instance where GraphDB runs. To grant the necessary permissions, paste the following JSON in the appropriate field when creating a policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances"
],
"Resource": "*"
}
]
}
Note
Some of those policies are needed only if the public TF module is used.
Allow the listing, creating, attaching, and tagging of EBS volumes¶
When instances start, the user data script will need to search for an available EBS volume to attach them, or if there aren’t any available - create a new one. This should be performed by the user data script. Alternatively, you can create the volumes separately, attach them to the instances, and mount them to the appropriate location. To achieve this, paste the following JSON in the appropriate field when creating a policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:CreateVolume",
"ec2:AttachVolume",
"ec2:DescribeVolumes"
],
"Resource": "*"
}
]
}
Because the volumes need to be tagged when they are created, the EC2 instance also requires permissions for tagging the volumes. Repeat the same steps as above, but use the following JSON for the policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:CreateTags"
],
"Resource": [
"arn:aws:ec2:*:*:volume/*",
"arn:aws:ec2:*:*:snapshot/*"
],
"Condition": {
"StringEquals": {
"ec2:CreateAction": [
"CreateVolume",
"CreateSnapshot"
]
}
}
}
]
}
Allow adding of records to Route 53 private hosted zone¶
The user data script will also have to be able to create “A” records in a Route 53 private hosted zone. To add the needed permissions, you will first need to find the ID of your hosted zone:
Go to the
dashboardSelect your zone
Expand
detailsCopy the
Once you have obtained your hosted zone ID, replace <zone_id>
with the hosted zone ID in the JSON below, and paste it in the appropriate field when creating a policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"route53:GetHostedZone",
"route53:ListResourceRecordSets",
"route53:ListHostedZones",
"route53:ChangeResourceRecordSets",
"route53:ListResourceRecordSets",
"route53:GetHostedZoneCount",
"route53:ListHostedZonesByName"
],
"Resource": "arn:aws:route53:::hostedzone/<zone_id>"
}
]
}
Creating an IAM role and attaching policies¶
Once you’ve created your different policies, you can create a role and associate it with the newly created policies:
Go to the
, and select from the navigation menu on the leftClick on
Select
and click “Next”Search for the policies that you created in the previous sections and select them
Additionally, we recommend you add a couple of optional policies:
The AmazonSSMFullAccess policy allows you to gain access to the EC2 instances using AWS Systems Manager
The CloudWatchAgentServerPolicy policy is required if you are planning on scraping the GraphDB Prometheus endpoints and pushing the metrics to Amazon CloudWatch
Once you’ve selected all of your policies, click on
Add a name for the role and click on
Launching GraphDB instances¶
Setting up the Launch template¶
Before you can launch your GraphDB instance, you will need to create a launch template and add a security group to it.
Note
Launch templates describe what configurations will be used for the machines
Security groups act like a firewall for the resources in AWS
Go to the
and select from the navigation menu on the left.Click on
and fill in the following configurationsEnter a descriptive name by which to recognize your launch template
Set to On
Click on and select Ubuntu 22.04
Note
AMI will be available in the future.
select a type such asr6i.2xlarge
, which should be more than sufficient for a repository with one billion triplesnot required and should be left as is
Under
, selectSelect your VPC
Leave the default outbound rule set as shown unless you want to restrict it
Add an inbound rule for port
7200
for the CIDR blocks that will be allowed to access GraphDBAdd an inbound rule for port range
7200
(for the proxy) -7201
(for GraphDB) and add the subnets of the load balancerAdd an inbound rule for port ranges
7200-7201
and7300-7301
and add the private subnetsAdd a Description
Click on
Return to the Launch template creation page, refresh the list of security groups, and select the newly created security group
Under
, fill in the following configurations:select the role you created in the previous section
At the very bottom of the form, configure the user data script that will be responsible for installing the necessary tools and GraphDB on the machines.
Tip
An example script is available on the Ontotext-AD github repo - just be sure to replace all Terraform template variables.
Note
If the user data script is already base64-encoded, check the box under the script field.
Click on the orange
buttonOn the
screen that appears, click on
Select
from the bottom of the navigation menu on the leftClick on
and fill in the following configurationsEnter a descriptive name by which to recognize your auto scaling group
Select the launch template that was created previously and click the orange Next button
Select the VPC that was created previously
Select all three private subnets and click Next
Under
, selectSelect
, then select the load balancer target group you created earlier(Optional) Under
, you can tune the , as well as turn on the , which will allow the load balancer to trigger the recreation of an instanceClick on
Under
, enter the number of nodes in your cluster (or in this case - 3) for , , and capacityClick on
until you reach the Review pageClick on
at the bottom.
Eventually, the three new instances will be listed on the Instances screen available on the left sidebar.
Creating a cluster¶
In order to create the cluster, you will need to get the address of the EC2 instances.
Go to the
dashboard and open you hosted zoneWrite down all records names, for records with type “A”
Once you’ve written down all records names, you can create a cluster by following the Creating and Managing a Cluster documentation from one of the instances.
Tip
The recommended way to gain access to the instances is to attach the AmazonSSMFullAccess policy, and then use the AWS CLI to connect to an instance by its ID:
aws ssm start-session --target i-04d62ace38b78d994
After doing this, you should be able to use standard utilities like sudo or su to change the user — for example, ubuntu.
Opening GraphDB instances¶
After all instances are running, you can launch your GraphDB instance:
Access the
Copy the DNS name
Paste it in the address bar of your browser and press Enter
Updating GraphDB configurations and versions¶
Because GraphDB and its configurations are baked into the AMI, you will need to recreate the EC2 instances when updating your GraphDB configuration to a newer minor version. You can do this by either manually stopping each individual instance, or scaling the cluster out, and then scaling it back in. This section describes both methods in detail.
Note
Make sure your user data script mounts the storage back to the instances.
Stopping individual EC2 instances¶
The faster way to update to a newer minor version and its configuration is to stop each individual EC2 instance. The downside to this method is that you are decreasing the cluster HA. In other words, if a node fails while recreating an instance, the cluster will be unable to process writes. The process is simple:
Update the AMI or the user data script in the launch template
Terminate the instances one by one, starting with the follower nodes, and leaving the leader node to be the last instance to be terminated
Note
To avoid compatibility issues, also refer to the Migrating GraphDB Configurations documentation.
Warning
When you terminate an instance, wait for the new one to be started. Then verify that it has successfully rejoined the cluster and that it is in sync before proceeding with the next one.
Scaling the cluster out and then back in¶
You can also recreate the EC2 instances by scaling the cluster out and in. The advantage of this approach is that the HA will not be impacted. However, the cluster will need to replicate its state to the new nodes. This can take a significant amount of time, especially with bigger-sized repositories.
Update the AMI or the user data script in the launch template
Double the size of the cluster
Note
Change the minimum, maximum and desired size of the auto scaling group
Once the new instances are started, join them to the cluster and wait until they are healthy and in sync with the cluster
Note
Make sure to join the nodes with a single API call to avoid replicating the cluster state multiple times
Add scale in protection on the new nodes
Remove the old nodes from the cluster
Change the minimum, maximum and desired size of the auto scaling group to their original value
Remove the scale in protection