SQL Access over JDBC

As a data scientist or an engineer with experience in specific SQL-based tools, you might want to consume RDF data from your knowledge graph or other RDF databases by accessing GraphDB via a BI tool of your choice (e.g., Tableau or Microsoft Power BI). This capability is provided by GraphDB’s JDBC driver, which enables you to create SQL views using SPARQL SELECT queries, and to access all GraphDB features including plugins and SPARQL federation. The functionality is based on the Apache Calcite protocol and on performing optimizations and mappings.

The JDBC driver works with preconfigured SQL views (tables) that are saved under each repository whose data we want to access. For simplicity of the table creation process, we have integrated the SQL View Manager in the GraphDB Workbench. It allows you to configure, store, update, preview, and delete SQL views that can be used with the JDBC driver, where each SQL view is based on a SPARQL SELECT query and requires additional metadata in order to configure the SQL columns.

Important

Over this functionality, you can only read data from the repository. Write operations are not enabled.

Configuration

Prerequisites

You need to download the GraphDB JDBC driver (graphdb-jdbc-remote-9.11.0.jar), a self-contained .jar file. The driver needs to be installed according to the requirements of the software that supports JDBC. See below for specific instructions.

For the purposes of this guide, we will be using the Netherlands restaurants RDF dataset. Upload it into a GraphDB repository, name it nl_restaurants, and set it as the active repository.

Now, let’s access its data over the JDBC driver.

Creating a SQL view

  1. Go to Setup ‣ JDBC. Initially, the list of SQL table configurations will be empty as none are configured.

  2. Click Create new SQL table configuration.

    In the view that opens, there are two tabs:

    • Data query: The editor where to input the SPARQL SELECT query that is abstracted as a SQL view for the JDBC driver. By default, it opens with a simple SPARQL query that defines two columns using rdfs:label - id and label.

      _images/jdbc-default-data-query.png

      Note

      The query contains a special comment in the query body that specifies the position of the filter clause that will be generated on the SQL side. Make sure that it is spelled out in lowercase, as otherwise the query parser would not recognize it.

    • Column types: Here, you can configure the SQL column types and other metadata of the SQL table. Hover over a field or a checkbox to see more information about it in a tooltip.

      Note that in order to create a table, it must contain at least one column.

      _images/jdbc-default-column-types.png
  3. Fill in a Table name for your table, e.g., restaurant_data. This field is mandatory and cannot be changed once the table has been created.

  4. Now, let’s edit the SPARQL SELECT query in the Data query body.

    Enter the following query in the editor:

    PREFIX ex:<http://example.com/ex>
    PREFIX base:<http://example/base/>
    PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
    
    select ?restaurant_name ?short_description ?long_description ?calendar where {
        ?s a base:Restaurant;
            rdfs:label ?restaurant_name;
        ex:shortDescription ?short_description;
        ex:longDescription ?long_description;
        ex:calendar ?calendar.
        # !filter
    }
    
  5. After adding the SPARQL SELECT query, go to the Column types tab and click the Suggest button. This will generate all possible columns based on the bindings inside the SELECT query. Additionally, SQL types will be suggested based on the xsd types from the first 100 results of the execution of the input query:

    _images/jdbc-column-type-config.png
  6. Here, you can:

    • Update the SQL type of each column. This is the only mandatory field.

    • Configure the precision of the SQL type if applicable (e.g., decimal).

    • Make a column NOT NULL (default is Nullable).

    • Provide a Literal type or language tag for SPARQL FILTER.

  7. You can also remove a column from the configuration with the delete icon on the right. If you want to add it again later, you can do so with the Suggest button, which will automatically add it again and suggest types for the columns.

  8. After configuring the table columns, return to the Data query tab and preview the table that it would return. It does not need to be saved in order to be previewed.

    _images/jdbc-query-preview.png

    Note

    If you click Cancel before saving, a warning will notify you that you have unsaved changes.

  9. After successfully configuring the SQL view, we can Save it. It will appear in the list of configured tables that can be used with the JDBC driver.

For the purposes of the BI tool examples further below, let’s also create another SQL view with the following query:

PREFIX ex:<http://example.com/ex>
PREFIX base:<http://example/base/>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>

select ?restaurant_name ?city ?country ?address ?zipcode ?latitude ?longitude where {
    ?s a base:Restaurant;
        rdfs:label ?restaurant_name;
        ex:inCity ?city_id;
        ex:address ?address;
        ex:zipcode ?zipcode;
        ex:latitude ?latitude;
        ex:longitude ?longitude.

    ?city_id rdfs:label ?city.
    ?city_id ex:in ?country_id.
    ?country_id rdfs:label ?country.
    # !filter
}

Name it restaurant_location and save it.

Updating a SQL view

To edit and update a SQL view, select it from the list of available SQL views that are configured for the selected repository. The configuration is identical to that used for creation, with the only difference that here you cannot update the name of the SQL view. You can edit and update the query and SQL column metadata.

After updating the configuration, you can Save and see that all changes have been reflected.

Deleting a SQL view

To delete a SQL view, click the delete icon next to its name in the available SQL views list.

_images/jdbc-delete-sql-table.png

Type mapping

This table shows all RDF data types, their type equivalent in SQL, and the conversion (or mapping) of RDF to SQL values.

Metadata type

SQL type

Default precision and scale

RDF to SQL

Default RDF type in FILTER()

string

VARCHAR

1,000

Literal.stringValue()

plain literal or literal with language tag

IRI

VARCHAR

500

IRI.stringValue()

IRI

boolean

BOOLEAN

Literal.booleanValue()

literal with xsd:boolean

byte

BYTE

Literal.byteValue()

literal with xsd:byte

short

SHORT

Literal.shortValue()

literal with xsd:short

int

INT

Literal.intValue()

literal with xsd:int

long

LONG

Literal.longValue()

literal with xsd:long

float

FLOAT

Literal.floatValue()

literal with xsd:float

double

DOUBLE

Literal.doubleValue()

literal with xsd:double

decimal

DECIMAL

19, 0

Literal.decimalValue()

literal with xsd:decimal

date

DATE

See below

literal with xsd:date, no timezone

time

TIME

See below

literal with xsd:time, no timezone

timestamp

TIMESTAMP

See below

literal with xsd:datetime, no timezone

Each metadata type may be followed by optional precision and scale in parentheses, e.g., decimal(15,2) or string(100) and an optional nullability specification that consists of the literal null or not null. By default, all columns are nullable.

RDF values are converted to SQL values on a best effort basis. For example, if something was specified as “long” in SQL, it will convert to a long value if the corresponding literal looks like a long number regardless of its datatype. If the conversion fails (e.g., “foo” cannot be parsed as a long value), the SQL value will become null.

The default RDF type is used only to construct values when a condition from SQL WHERE is pushed to a SPARQL FILTER().

Dates, times, and timestamps are tricky as there is no timezone support in those types in SQL. There are SQL types with timezone support but they are not implemented fully in Calcite. In order to provide a most common use case, we proceed as follows:

  • Ignore the timezone on date and time literals.

    Dates such as 2020-07-01, 2020-07-01Z, 2020-07-01+03:00, and 2020-07-01-03:00 will all be converted to 2020-07-01.

    Times such as 12:00:01, 12:00:01Z, 12:00:01+03:00, and 12:00:01-03:00 will all be converted to 12:00:01.

    No timezone will be added when constructing a value for filtering.

  • On datetime values we consider “no timezone” to be equivalent to “Z” (i.e., +00:00), all other timezones will be converted by adjusting the datetime value by the respective offset.

    No time zone will be added when constructing a value for filtering.

WHERE to FILTER conversion

The following SQL operators are converted to FILTER and pushed to SPARQL, if possible:

  • Equality: =, <>, <, <=, >=

  • Nullability: IS NULL, IS NOT NULL

  • Text search: LIKE, SIMILAR TO

The conversion happens only if one of the operands is a column and the other one is a constant.

Table verification

We can also use an external tool such as SQuirrel Universal SQL Client to verify that the SQL table that we created through the Workbench is functioning properly.

After installing it, execute the following steps:

  1. Download the GraphDB JDBC driver (graphdb-jdbc-remote-9.11.0.jar), a self-contained .jar file.

  2. Open SQuirrel and add the JDBC driver: go to the Drivers tab on the left, and click the + icon to create a new driver.

  3. In the dialog window, select Extra Class Path and click Add.

  4. Go to the driver’s location on your computer, select it, and click Choose.

  5. In the Name field, choose a name for the driver, e.g., GraphDB.

  6. For Example URL, enter the string jdbc:graphdb:url=http://localhost:7200 (or the respective endpoint URL if your repository is in a remote location).

  7. For Class Name, enter com.ontotext.graphdb.jdbc.remote.Driver. Click OK.

    _images/squirrel-add-jdbc-driver.png
  8. Now go to the Aliases tab on the left, and again click click the + to create a new one.

  9. You will see the newly created driver and its URL visible in the dialog window. Choose a name for the alias, e.g., GraphDB localhost. Username “admin” and password “root” are only necessary if GraphDB security is enabled.

    _images/squirrel-add-alias.png
  10. You can now see your repository with the two tables that it contains:

    _images/squirrel-view-tables.png
  11. In the SQL tab, you can see information about the tables, such as their content. Write your SQL query in the empty field and hit Ctrl+Enter (or the Run SQL icon above):

    _images/squirrel-sql-results.png

    You can also see the metadata:

    _images/squirrel-sql-metadata.png

Usage examples

Tableau

Now let’s transform your RDF data into SQL:

  1. Download the GraphDB JDBC driver (graphdb-jdbc-remote-9.11.0.jar).

  2. Place it in the in the Tableau directory corresponding to your operating system:

    • Windows: C:\Program Files\Tableau\Drivers

    • MacOS: ~/Library/Tableau/Drivers

  3. Start Tableau and go to Connect ‣ Other Databases (JDBC).

  4. Enter the JDBC connection string in the URL field: jdbc:graphdb:url=http://localhost:7200 (or the respective endpoint URL if your repository is in a remote location).

    _images/tableau-connect-jdbc.png
  5. On the next screen, under Databases you will see GraphDB. Select it.

  6. On the drop-down Schema menu, you should see the name of the GraphDB repository, in our case NL_Restaurants. Select it.

  7. Tableau is now showing the SQL tables that we created earlier - restaurant_data and restaurant_location.

  8. Drag the Restaurant_Location table into the field in the centre of the screen and click Update Now.

    _images/tableau-table-screen.png
  9. Go to Sheet 1 where we will visualize the restaurants in the dataset based on:

    1. their location:

      1. On the left side of the screen, select the parameters: Country, City, Restaurant_Name, Zipcode.

      2. On the right side of the screen, select the symbol maps option.

        _images/tableau-location.png
      3. Drag the Restaurant_Name parameter, which is now in the Rows field, into Marks ‣ Colors.

        The resulting map should look like this:

      _images/tableau-restaurants-location.png
    2. the number of restaurants in a given location:

      1. On the left side of the screen, select the parameters: Country, City, Restaurant_Name.

      2. On the right side of the screen, again select the symbol maps option.

      3. Drag the Restaurant_Name parameter, which is now in the Rows field, into Marks –> Size.

        The resulting map should look like this:

      _images/tableau-restaurants-size.png

Microsoft Power BI over ODBC protocol

When working with BI tools that do not support JDBC, as is the case with Microsoft Power BI, you need to use an ODBC-JDBC bridge, e.g., Easysoft’s ODBC-JDBC Gateway.

After downloading and installing the gateway in your Windows operating system, connect it to GraphDB the following way:

  1. Download the GraphDB JDBC driver (graphdb-jdbc-remote-9.11.0.jar).

  2. From the main menu, go to ODBC Data Sources (64-bit).

  3. In the dialog window, go to System DSN and click Add.

  4. In the next window, select Easysoft ODBC-JDBC Gateway and click Finish.

  5. In the next window, we will configure the connection to GraphDB:

    • in the DSN field, enter the name of the new driver, for example “GraphDB-Test”. The Description field is optional.

    • for User Name, enter “admin”, and for Password - “root”. These are not mandatory, except when GraphDB security is enabled.

    • for Driver Class, enter com.ontotext.graphdb.jdbc.remote.Driver.

    • for Class Path, click Add and go to the location of the driver’s .jar file on your computer. Select it and click Open.

    • for URL, enter the same string as in the Tableau example above: jdbc:graphdb:url=http://localhost:7200/ (or the respective endpoint URL if your repository is in a remote location).

    _images/odbc-gateway-configuration.png
  6. Click Test to make sure that the connection is working, then click OK.

  7. In the previous dialog window, you should now see the GraphDB-Test connection.

This concludes the gateway configuration, and we are now ready to use it with Microsoft Power BI.

Let’s use the Netherlands Restaurants example again:

  1. Start Power BI Desktop and go to Get Data.

  2. From the pop-up Get Data window, go to Other -> ODBC. Click Connect.

  3. From drop-down menu in the next dialog, select GraphDB-Test.

  4. In the next dialog window, enter username “admin” and password “root” (the password is only mandatory if GraphDB security is enabled).

  5. in the Navigator window that appears, you can now see the GraphDB directory and the tables it contains - Restaurant_Data and Restaurant_Location. Select the tables and click Load.

    _images/powerBI-tables.png
  6. To visualize the data as a geographic map (similar to the Tableau example above), select the Report option on the left, and then the Map icon from the Visualizations options on the right.

  7. You can experiment with the Fields that you want visualized, for example: selecting City will display all the locations in the dataset.

    _images/powerBI-locations.png
  8. You can also view the data in table format, as well as see the way the two tables are connected, by using the Data and Model views on the left.

How it works: Table description

As mentioned above, each SQL table is described by a SPARQL query that also includes some metadata defining the SQL columns, their types, and the expected RDF type. For the restaurant_data example, it will look like this:

# !column : restaurant_name : string not null
# !column : short_description  : string
# !column : long_description  : string
# !column : calendar  : string

PREFIX ex:<http://example.com/ex>
PREFIX base:<http://example/base/>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>

select ?restaurant_name ?short_description ?long_description ?calendar where {
    ?s a base:Restaurant;
        rdfs:label ?restaurant_name;
    ex:shortDescription ?short_description;
    ex:longDescription ?long_description;
    ex:calendar ?calendar.
    # !filter
}

It is generated as an .rq file upon creation of a SQL table from the Workbench, and is automatically saved in a newly created sql subdirectory in the respective repository folder. In our case, this would be:

<graphdb-distribution>/data/repositories/nl_restaurants/sql/restaurant_data

You can download and have a look at the two SPARQL queries that we used for the above examples: