How to deploy GraphDB in Azure

GraphDB can be deployed on Microsoft Azure by following the general installation instructions. You can find information regarding the costs of running a GraphDB instance on the Azure website.

Tip

We strongly recommend using the provided Terraform module, as it automates the procedure for deploying GraphDB on Azure and mitigates challenges commonly encountered during deployment. You can find the module in our GitHub repository. The Terraform module is also available in the Terraform registry.

Architecture

The GraphDB architecture diagram showcases the deployment architecture for GraphDB on VM scale set instances in Azure cloud platform. The diagram illustrates the key components, and their interactions to provide a high-level understanding of the system’s architecture and how it should be deployed.

Diagram representing the architecture of GraphDB deployment in Azure, displaying a Subscription level with Storage Account and App Configuration for backups and settings, respectively. Key Vault for customer keys and DNS Private Zones for internal communication are shown, alongside a Monitor for metrics and logs. Within the Virtual Network, Application Gateway in a subnet offers load balancing, while NAT Gateway routes traffic. Three Availability Zones feature GraphDB VM instances for scalability and reliability.

Note

There are no third-party integration points on the default GraphDB deployment.

Prerequisites

There are several prerequisites for running a GraphDB instance on Azure:

Note

We highly recommend you use the the GraphDB Terraform module to deploy GraphDB on Azure. The module contains Terraform scripts that fully setup the GraphDB cluster and handle DNS records registration, and more. If you use these template files on their own, you will need to replace the placeholder values of all variables with your actual values. Please note that the scripts are template files and are therefore escaped.

Technical requirements

The following Azure services are required to complete the GraphDB deployment on Azure:

Service

Description

Resource Group

Container that holds closely related Azure resources and services that forms a solution.

Virtual Network

Private virtual network that enables Azure resources to communicate securely with each other.

VM scale set

Scalable compute service for creating and managing load balanced VM instances. Used to deploy GraphDB VM images build with GraphDB Azure Packer scripts that packages GraphDB and GraphDB external cluster proxy.

Managed disks

Block level storage for persistent data attached to VM instances. Used for persistent storage of GraphDB instance data, configurations and log files.

Application Gateway

Scalable layer 7 web load balancer managing traffic to applications in Azure. Used to load balance requests to GraphDB’s external cluster proxies running in the VM scale set.

NAT Gateway

Gateway for private outbound connectivity to the internet from VM instances. Provides GraphDB VM scale set instances with NAT based internet connectivity without directly exposing them.

Key Vault

Secure storage for secret keys and certificates.

App Configuration

Service for central storage and management of application settings and feature flags. Used to store GraphDB configurations and license.

Storage Account

Secure storage for files and objects. Used for scheduled GraphDB backups that are stored as BLOBs in a storage container.

Azure Monitor

Monitoring service that collects and aggregates data, metrics and service logs from different Azure resources.

DNS Private Zones

Secure DNS service for private DNS resolution between Azure resources. Used to establish stable network identifiers for GraphDB VM scale set instances.

Public IP

Dedicated IP address exposing Azure resources on the internet. Used to expose the Application Gateway and NAT gateway on the internet.

Network Security Groups

Security rules restricting the network traffic between Azure resources in an Azure Virtual Network. Used to restrict the traffic between the Virtual Network subnets.

Required skills

Note

Deploying GraphDB on Azure requires a combination of skills in Azure infrastructure management, database administration, and system troubleshooting. Acquiring these skills may involve hands-on experience, self-study, online resources, and formal training programs provided by Azure like Azure Fundamentals or other educational platforms.

The following skills and knowledge are typically required in order to successfully deploy GraphDB on Azure:

Azure Fundamentals

Familiarity with Microsoft Azure and understanding of its core concepts, such as subscriptions and resource groups, VM instances, virtual networks, security groups and RBAC roles. Knowledge of how to navigate the Azure Portal and interact with Azure resources is essential.

Azure Virtual Networks

Understanding network fundamentals and security, subnets and network security groups. Knowledge of how to set up inbound and outbound traffic rules with NSGs to allow communication with GraphDB.

Azure VM scale sets

Proficiency in creating and managing Azure VM instances. This includes selecting the appropriate machine size, configuring security settings, managing storage (Managed Disks), and understanding VM instance lifecycle management.

Monitoring and Troubleshooting

Proficiency in monitoring the health and performance of GraphDB instances on Azure. Understanding of logging, monitoring and troubleshooting techniques using Azure Monitor, VM instance logs, and GraphDB diagnostic tools.

Linux Administration

Proficiency in Linux command-line interface (CLI) and basic administration tasks. This includes SSH access to the VM instances using Azure Bastion, navigating the file system, managing permissions, installing packages, and configuring system settings.

Database Management

Knowledge of GraphDB and its deployment requirements. Understanding of how to configure GraphDB settings, including database storage, memory allocation, and repository creation.

Database Backup and Recovery

Familiarity with backup and recovery strategies for GraphDB on Azure. Knowledge of Azure services like Storage Account for data backups and restoration processes.

High Availability and Scalability

Knowledge of implementing high availability and scalability for GraphDB on Azure. This may involve using features like VMSS Auto Scaling, load balancers, and multi-Availability Zone (AZ) deployments.

Infrastructure as Code (IaC)

Familiarity with Infrastructure as Code principles and tools like Terraform. This enables automating the provisioning and configuration of GraphDB infrastructure.

Security Best Practices

Understanding of security best practices for Azure deployments, including data encryption, access controls, identity and access management, and compliance considerations.

Creating a Resource Group

Before you start deploying your resources, you need to create a resource group to contain them.

  1. Go to the Azure Portal.

  2. Click on Create a resource.

    Graphic banner showcasing Azure services, including icons and labels for 'Create a resource,' 'Resource groups,' 'Quickstart Center,' and 'Virtual machines' with a right arrow indicating 'More services.' Each icon visually represents the respective service offered by Azure. This image showcases the location of the 'Create a resource' icon in Azure.
  3. Search for Resource group and select it, then click on Create.

    Search result interface in Azure Portal for 'resource group' with filters for pricing and operating system set to 'All.' The interface lists the Resource Group service by Microsoft Azure, emphasizing the capability to manage and deploy resources in an application together. The interface includes a 'Create' button.
  4. Fill in the required information:

    • Resource group name: Provide a unique name.

    • Region (Default region is East US)

      Configuration form for creating a resource group in Azure, displaying fields for subscription and resource group name --- pre-filled with 'rg-example' --- as well as region, set to 'East US.' A navigation pane shows steps for 'Basics,' 'Tags,' and 'Review + Create,' indicating the current stage in the setup process.
  5. Click on Review + Create, and then on Create.

Creating a Virtual Network

  1. Navigate to the Resource Group you just created.

  2. Click on + Create.

  3. Search for Virtual Network and select it, then click on Create.

    Search interface in the Azure Portal showing results for 'virtual network' with an option to add filters. The first result is the Virtual Network service by Microsoft Azure, which allows users to create a logically isolated section in Microsoft Azure and securely connect it outward. The interface includes a 'Create' button.
  4. Fill in the required information:

    • Name

    • Address Space

    Configuration form for a new virtual network in the Azure Portal. Fields for 'Subscription' and 'Resource group' are pre-filled with 'Your Subscription' and 'rg-example', respectively. The form also includes 'Instance details' with a field for 'Virtual network name' filled with 'virtualnetwork-example' and 'Region' set to '(US) East US.' An additional option to 'Deploy to an edge zone' is present, suggesting advanced deployment settings.
  5. Click on Review + Create, and then on Create.

Creating the two subnets

Inside the Virtual Network you just created, you need to create two subnets. One of the subnets will be used for the Virtual machine scale set, and the other will be used for the Application gateway.

  1. Navigate to the Subnets tab inside the Virtual Network you just created.

    Subnets tab within the 'virtualnetwork-example' Virtual Network in the Azure Portal, showing an existing 'default' subnet with an IPv4 address range of 10.0.0.0/24. Options to create a new subnet or gateway subnet are available, as well as functionalities to refresh the list, manage users, or delete existing subnets.

    Note

    You can access the Virtual Network view through the Resource Group.

  2. Click on + Subnet.

  3. Provide a unique Name and Address Range.

  4. Click on OK.

  5. Under Service endpoints, search for the endpoint you need, and select it:

  • When creating the Gateway subnet, select Microsoft.KeyVault.

  • When creating the VMSS subnet, select Microsoft.Storage.

  1. Scroll down to Network policy for private endpoints, and under Private endpoint network policy, select both Network security groups and Round tables.

    Configuration options for a subnet in Azure, showing 'Service endpoints' with 'Microsoft.KeyVault' selected, allowing traffic to Azure Key Vault from the subnet. 'Subnet delegation' is set to 'None,' and 'Network policy for private endpoints' has '2 selected,' indicating custom network policies have been applied to control traffic to private endpoints within the subnet
  2. Click Save, then repeat the process one more time for the second subnet.

    Subnets listing within the 'virtualnetwork-example' Virtual Network in Azure, showing three subnets: 'default,' 'vmss-subnet-example,' and 'gateway-subnet-example' with their respective IPv4 address ranges and 251 available IPs each. No entries for IPv6, delegation, security groups, or route tables are shown, indicating these subnets are not currently configured with these specific features.

Creating an Application Security Groups (ASG)

  1. Navigate to your Resource Group and click on + Create.

  2. Search for Application Security Group and select it, then click on Create.

  3. Provide a unique name for your ASG.

  4. Click on Review + Create, and then on Create.

Creating the two Network Security Groups (NSG)

You need to create two Network Security Groups (NSGs) — one for the VMSS, and one for the Gateway.

  1. Navigate to your Resource Group and click on + Create.

  2. Search for Network Security Group and select it, then click on Create.

  3. Provide a unique name for your NSG.

    The configuration screen for setting up a new Network Security Group (NSG) within the Azure Portal. The form shows 'Project details' with 'Subscription' and 'Resource group' fields populated as 'Your Subscription' and 'rg-example', respectively. 'Instance details' are partially visible with a field for 'Name' filled in as 'vmss-nsg-example' and 'Region' set to 'East US'. This setup screen is part of the process for defining NSGs for a virtual machine scale set (VMSS) in Azure.
  4. Click on Review + Create, and then on Create.

  5. Repeat the process one more time for the second NSG.

After both NSGs have been deployed, you need to associate each one of them with their respective subnet.

  1. Navigate to the newly deployed NSG, and go to Subnets.

  2. Click on Associate.

  3. Select the virtual network you created earlier from the drop-down menu, and then select the respective subnet.

Configuring the security rules for the VMSS NSG

After you create the NSGs, you also need to configure the security rules for both of them. The VMSS NSG has both Inbound and Outbound security rules. You can use the table below for reference for each individual rule.

Source

Source configuration

Source port ranges

Destination

Destination configuration

Service

Destination port ranges

Protocol

Action

Recommended priority

Inbound rule #1

IP Addresses

IP Address: 10.0.0.0/16

7200-7201

Application security group

The ASG you created.

Custom

7200-7201

TCP

Allow

100

Inbound rule #2

Application security group

The ASG you created.

*

Application security group

The ASG you created.

Custom

7200-7201

TCP

Allow

200

Inbound rule #3

Application security group

The ASG you created.

*

Application security group

The ASG you created.

Custom

7300-7301

TCP

Allow

210

Outbound rule #1

Application security group

The ASG you created.

*

Service tag

Destination service tag: Internet

Custom

*

Any

Allow

100

Outbound rule #2

Application security group

The ASG you created.

*

Application security group

The ASG you created.

Custom

7200-7201

TCP

Allow

200

Outbound rule #3

Application security group

The ASG you created.

*

Application security group

The ASG you created.

Custom

7300-7301

TCP

Allow

210

Configuring the security rules for the Gateway NSG

After you configure the security rules for the VMSS NSG, you also need to configure the security rules for the Gateway NSG. It requires only Inbound rules. You can use the table below for reference for each individual rule.

Source

Source service tag

Source port ranges

Destination

Destination configuration

Service

Destination port ranges

Protocol

Action

Recommended priority

Inbound rule #1

Service tag

GatewayManager

*

Any

N/A

Custom

65200-65535

TCP

Allow

100

Inbound rule #2

Service tag

AzureLoadBalancer

*

IP Addresses

IP Address: 10.0.0.0/24

Custom

8080

TCP

Allow

110

Inbound rule #3

Service tag

Internet

*

IP Addresses

IP Address: 10.0.0.0/24

HTTP

80

TCP

Allow

120

Inbound rule #4

Service tag

Internet

*

IP Addresses

IP Address: 10.0.0.0/24

HTTPS

443

TCP

Allow

130

Inbound rule #5

Any

N/A

*

IP Addresses

IP Address: 10.0.0.0/24

Custom

8080

Any

Deny

4000

Creating the two Public IP addresses

You need to create two public IP addresses — one for the Application gateway, and one for the NAT gateway.

  1. Navigate to your Resource Group and click on + Create.

  2. Search for Public IP Address and select it, then click on Create.

  3. Fill in the required information:

    • Name: Provide a unique name.

    • Zone: Zone redundant (recommended).

    • Assignment: Static or Dynamic (depending on your needs).

  4. Click on Review + Create, and then on Create.

  5. Repeat the process one more time for the second Public IP address.

Note

The idle connection timeout is 5 minutes (the Azure default is 4 minutes). To prevent timeouts, we recommend the inclusion of Keep-Alive messages with your requests.

Creating the NAT gateway

  1. Navigate to your Resource Group and click on + Create.

  2. Search for NAT Gateway and select it, then click on Create and provide a unique name.

  3. Under Outbound IP, select the public IP you just created.

    Configuration panel for Outbound IP settings in the Azure Portal for a NAT gateway setup. The section shows a dropdown for 'Public IP addresses' with one selected: 'public-ip-nat-example (4.255.57.11)'. Options to create a new public IP address and public IP prefix are also available, indicating the user can allocate additional outbound IPs for the NAT gateway.
  4. Under Subnet, select the virtual network created earlier from the drop-down menu, then select the VMSS subnet.

    Subnet association panel for a NAT gateway in Azure, displaying the 'virtualnetwork-example' with selectable subnets below. The 'vmss-subnet-example' is selected with its corresponding address range, '10.0.1.0/24', indicating it is designated for the VMSS. Options are available to manage subnets further, with hints that subnets can be added or removed after creating the NAT gateway.
  5. Click on Review + Create, and then on Create.

Creating a User-Assigned Managed Identity

  1. Navigate to your Resource Group and click on + Create.

  2. Search for User-Assigned Managed Identity and select it, then click on Create.

  3. Provide a unique name for the User-assigned managed identity.

  4. Click on Review + Create, and then on Create.

Note

  • Make sure that Identity has enough rights to “Create” and “Read”

  • You will need to assign identity to DNS Private Zone.

Creating the Storage Account

  1. Navigate to your Resource Group and click on + Create.

  2. Search for Storage Account and select it, then click on Create.

  3. Under Networking, select the virtual network and choose the VMSS subnet.

    Networking tab for a Storage Account creation in Azure, showing options for Network connectivity. 'Enable public access from selected virtual networks and IP addresses' is chosen. Under Virtual networks, 'Your Subscription' is selected, and the virtual network 'virtualnetwork-example' is specified with the 'vmss-subnet-example' subnet chosen. A notice indicates that a 'Microsoft.Storage' service endpoint will be added to the subnet.
  4. Under Data Protection:

    • Check Enable soft delete for blobs, with Days to retain deleted blobs: 31.

    • Check Enable soft delete for containers, with Days to retain deleted containers: 31.

    • Check Enable blob change feed, and select Delete change feed logs after (in days): 31.

    • Uncheck everything else.

  5. Click on Review + Create, and then on Create.

After the Storage Account has been deployed:

  1. Navigate to the newly created Storage Account, and go to Containers.

  2. Create containers for backup.

Creating the Application Gateway

  1. Navigate to your Resource Group and click on + Create.

  2. Search for Application Gateway and select it, then click on Create.

  3. Set the following instance details:

    • Minimum instances: 1

    • Maximum instances: 2

    • Availability zone: Select all 3 availability zones.

    Instance details panel for creating an Application Gateway in Azure, named 'application-gateway-example'. The region is set to 'East US', with the tier selected as 'Standard V2'. Autoscaling is enabled, with a minimum instance count of 1 and a maximum of 2. All three availability zones are selected for deployment, and HTTP2 is enabled.
  4. Under Configure virtual network, select the gateway subnet.

    Virtual network configuration section for an Azure Application Gateway setup. The virtual network 'virtualnetwork-example' is selected along with a specific subnet for the gateway, 'gateway-subnet-example (10.0.2.0/24)'. An option to manage subnet configuration is also visible.
  5. Under Frontends, select the gateway IP created earlier.

  6. Under Backends, add a backend pool with targets.

    Note

    Once you create and properly configure the VMSS, Azure should automatically add it as a target to the backend pool.

  7. Under Configuration, define the Rule, Listener and Backend targets.

    • Configurations for the Listener:

      • Protocol: HTTPS

      • Port: 443

      • Certificate: Upload your certificate.

      Configuration interface for a listener in Azure Application Gateway. The listener is named 'gateway-https-request-listener-example' and set up for the HTTPS protocol on frontend IP 'Public' with port 443. HTTPS settings specify to 'Upload a certificate' with 'certificate-example' as the cert name and 'certificate-example.pfx' as the PFX certificate file. Password entry for the certificate is provided, and the listener type is set to 'Basic'. Options for custom error pages, such as 'Bad Gateway - 502' and 'Forbidden - 403', are available for further configuration.
    • Configurations for the Backend targets:

      • Select the Backend pool you just created, and add new settings, as follows:

      • Protocol: HTTP

      • Port: 7201

      • Request time-out: 86400.

        Note

        86400 is the maximum number of seconds allowed. Such a long time is necessary for GraphDB to be able to operate.

      • Override backend path: /

      • Override with new host name: No

      Configuration screen for backend targets of an Azure Application Gateway. The screen shows options to select a target type, with 'Backend pool' chosen and a drop-down menu displaying 'backend-pool-example'. Backend settings indicate 'backend-setting-example' is selected. Path-based routing section is present but shows no additional targets, implying a need for further setup for any path-based routing rules. Configuration interface for backend settings named 'backend-setting-example' in Azure Application Gateway. The backend protocol selected is HTTP, the backend port is set to 7201. Additional settings include request time-out configured for 86400 seconds, backend path override set to '/', and host name override is set to 'No'. Options for cookie-based affinity, connection draining, and custom probes are present, with affinity and draining currently disabled.
  8. Set a high priority for the rule.

    Tip

    We recommend a value of 100.

  9. Review and create the resource.

Warning

Deploying the application gateway can take longer than most of the other resources.

Creating the Virtual Machine Scale Set (VMSS)

Now that all other required resources have been deployed, the next step is to create the Virtual Machine Scale Sets (VMSS), where each instance functions as a cluster node. VMSS enables the deployment and management autoscaled, identical virtual machines, allowing you to dynamically scale in or out.

Additionally, this is where you can add a custom user data script if you want to use one.

Tip

You can use the custom user data script that is part of the Terraform module for reference when creating your script.

  1. Navigate to your Resource Group and click on + Create.

  2. Search for Virtual Machine Scale Set and select it, then click on Create.

  3. Under Scale set details, provide a unique name and select all necessary availability zones.

  4. Under Orchestration, choose Uniform.

  5. Under Instance details, click on See all images and search for the GraphDB image in the Marketplace.

    Marketplace search result in Azure, showing the GraphDB Enterprise Edition by Ontotext AD. It is described as a robust triplestore designed to handle large-scale RDF data, suitable for knowledge graphs and linked data applications. An option to select the virtual machine image is available for deployment.
  6. For each availability zone, add a managed disk — the recommended disk is Premium SSDv2 with 7500 IOPS and 250 throughput, 500GB.

    Warning

    At the end of the deployment process you must attach each a disk to each VMSS instance as LUN 2.

    Disk creation overview for an Azure VMSS, detailing a new disk named 'vmss-example_DataDisk_0'. The source type is set to 'None (empty disk)' and the size specified is 500 GiB on a Premium SSDv2 with 7500 IOPS and 250 throughput.
  7. Add to the earlier created virtual network and select the VMSS subnet.

    • Under Load balancing options, select the Application Gateway created earlier.

  8. Under Management, add the user-assigned managed identity you created earlier.

  9. If you want to use a cluster, we recommend a value of 3, 5 or 7 under Scaling.

  10. Enable health monitoring with the following configuration:

    • Type: Application health extension

    • Protocol: HTTP

    • Port number: 7200

    • Path: /rest/cluster/node/status

  11. If you want to use a user data script, you can add it under Advanced.

    Tip

    You can use the custom user data script that is part of the Terraform module available on the Ontotext-AD github repo for reference when creating your script.

  12. Review and create the resource.

Attaching the managed disks

After you have deployed your VMSS, you need to attach a disk to each of your VMSS instances.

Hint

The disk management script from the Terraform module can be used as a reference for attaching your managed disks.

  1. Navigate to the newly created VMSS, and go to Instances.

  2. Open the instance you want to attach a managed disk to.

  3. Under Disks, click on Attach existing disks.

  4. Add the disk you created earlier with LUN 2.

Setting up and managing your cluster

Once all the resources are installed and configured correctly, you can create a cluster by accessing Bastion and using the REST API.

Warning

If one of your instances dies or becomes unavailable, you will need to manually remove the node and add a new replacement node.

Accessing the Workbench through the Application Gateway

After all the resources are installed, the cluster has been created and all the instances are healthy, the Workbench can be accessed from the Application Gateway.

Warning

You cannot use the Workbench interface to create clusters in an Azure deployment. If you haven’t created a cluster yet, Workbench would not be loaded.

Scaling out and back in VMSS

  1. Navigate to the Resource group containing your VMSS in the Azure Portal.

  2. Go inside the VMSS and select Scaling in the navigation pane.

  3. Under Configure, choose Manual scale or Custom autoscale based on your preference.

  4. Under Scale-in Policy, select the NewestVM - Balance accross the availability zones, and delete the newest created virtual machines.

  5. Adjust the Instance count to the desired number of instances.

    Note

    If you require a GraphDB Cluster, the instance count is recommended to be 3, 5 or 7.

  6. Click on Save to apply the scaling configuration.