Loading data using OntoRefine¶
What’s in this document?
OntoRefine - overview and features¶
GraphDB OntoRefine is a data transformation tool, based on OpenRefine and integrated in the GraphDB Workbench. It can be used for converting tabular data into RDF and importing it into a GraphDB repository, using simple SPARQL queries and a virtual endpoint. The supported formats are TSV, CSV, *SV, XLS, XLSX, JSON, XML, RDF as XML, and Google sheet. Using OntoRefine, you can easily filter your data, edit its inconsistencies, convert it into RDF, and import it into a repository.
OntoRefine enables you to:
- Upload your data file(s) and create a project.
- View the cleaned data as RDF.
- Transform your data using SPIN functions.
- Import the newly created RDF directly in a GraphDB repository by using the GraphDB SPARQL endpoint.
Example Data¶
The data used for the examples can be found at:
Upload data in OntoRefine¶
Open OntoRefine in the Workbench¶
To transform your data into RDF, you need a working GraphDB database.
Start GraphDB in a workbench mode.
graphdb
Open http://localhost:7200/ in a browser.
Go to Import -> Tabular (OntoRefine).
All data files in OntoRefine are organized as projects. One project can have more than one data file.
The Create Project action area consists of three tabs corresponding to the source of data. You can upload a file from your computer, specify the URL of a publicly accessible data, or paste data from the clipboard.
Create a project¶
Click Create Project -> Get data from.
Select one or more files to upload:
from your computer
from web addresses (URLs)
from clipboard
Click Next.
(Optional) Change the table configurations and update the preview.
With the first opening of the file, OntoRefine tries to recognize the encoding of the text file and all delimiters.
Click Create Project.
The result is table similar to that of an Excel or a Google sheet.
Import a project¶
To import an already existing OntoRefine project:
Go to Import Project.
Choose a file (.tar or .tar.gz)
Import it.
Open a project¶
Once the project is created
- Go to Open Project.
- Click the one you want to work on.
- (Optional) Delete your project if you want to.

Viewing tabular data as RDF¶
Unlike OpenRefine that supports RDF for an input, OntoRefine also supports RDF as an output. To view your data in triples, click the OntoRefine RDF button. A settings screen appears to configure the RDF data that OntoRefine helps you generate.

You can change the IRIs generated for your data or leaving the defaults you will obtain the following query. Click Run to obtain the results.

Warning
OntoRefine SPARQL button sends queries only to the currently open OntoRefine project and must not be mistaken with the GraphDB Workbench SPARQL tab, which is directly connected to the current repository.
RDFising data¶
You can transform your data in the OntoRefine SPARQL endpoint using a CONSTRUCT query.
Using CONSTRUCT query in the OntoRefine SPARQL endpoint¶
Construct new RDF data based on the RDFized model of your tabular data. Use SPIN functions to construct proper triples.
GraphDB 8.O supports SPIN functions:
- SPARQL functions for splitting a string
- SPARQL functions for parsing dates
- SPARQL functions for encoding URIs
Parsing dates and encoding URIs¶
PREFIX mydata: <http://example.com/resource/>
PREFIX spif: <http://spinrdf.org/spif#>
# Example query returning RDF data
CONSTRUCT {
?presidentIRI a mydata:President ;
mydata:tookOffice ?tookOfficeParsed ;
mydata:leftOffice ?leftOfficeParsed ;
mydata:nominatedBy ?party ;
mydata:homeState ?stateIRI
} WHERE {
# Triple patterns for accessing each row and the columns in contains
# Note that no triples will be generated for NULL values in the table
# You should inspect your data in Refine mode and add OPTIONAL accordingly
?row a mydata:Row ;
mydata:Presidency ?Presidency ;
mydata:President ?President ;
mydata:Wikipedia_Entry ?Wikipedia_Entry ;
mydata:Took_office ?Took_office ;
mydata:Left_office ?Left_office ;
mydata:Party ?Party ;
mydata:Portrait ?Portrait ;
mydata:Thumbnail ?Thumbnail ;
mydata:Home_State ?Home_State .
# Uses SPIN function to parse the dates
bind(spif:parseDate(?Took_office, "dd/MM/yyyy") as ?tookOfficeParsed)
bind(spif:parseDate(?Left_office, "dd/MM/yyyy") as ?leftOfficeParsed)
# Uses several functions to construct IRIs for the presidents and their states
bind(iri(concat("http://example.com/", spif:encodeURL(?President))) as ?presidentIRI)
bind(iri(concat("http://example.com/", spif:encodeURL(?Home_State))) as ?stateIRI)
} LIMIT 100
The rows and columns from the table are mapped with the help of the following URIs:
mydata:Row
- the RDF type for each row;mydata:rowNumber
- the property for row number as an integer literal;mydata:<column_name>
- properties for each column
Importing data in GraphDB¶
When you are satisfied with the transformation of your data, you can import it in the current repository without leaving the GraphDB Workbench.
Click Data -> Open in main SPARQL endpoint
The query is the same as the previous one only with the addition of a
SERVICE
clause. You only have to changeCONSTRUCT
toINSERT
and remove theLIMIT
. Instead of showing the RDF, GraphDB will insert it into the current repository.Execute the query to import the results.
Rows and Records¶
OpenRefine can view the same data both as rows and as records. See https://github.com/OpenRefine/OpenRefine/wiki/Variables#record and http://kb.refinepro.com/2012/03/difference-between-record-and-row.html.
Records¶
For the purpose of the demo we can create records for each party. Go to Party column -> Edit column -> Move column to the beginning By Clicking Party column -> Text facet we can see that some values contain spaces.

- Clean them by Party column -> Edit cells -> Transform and type value.trim(). Observe how the text facet is updated.

- Now click Party column -> Sort -> OK Just use the defaults
- And then Sort -> Reorder rows permanently

- And Party column -> Edit cells -> Blank down. Click on Records. The result looks like this.

- Now, the RDF button leads to the following query and results.

When there are records, the following URIs will be used in addition to the row ones:
mydata:recordNNN
- identifies a record, NNN is the 1-based record number;mydata:Record
- the RDF type for each record;mydata:recordId
- the property for record id (the value from the first column that creates an OpenRefine record);mydata:hasRow
- the property that links records to rows.
Splitting content in a cell¶
Load the movie dataset for this example. Sometimes you may have multiple values in the same cell. You can easily split them and generate separate URIs using SPIF functions.
PREFIX mydata: <http://example.com/resource/>
PREFIX spif: <http://spinrdf.org/spif#>
SELECT * {
?row a mydata:Row ;
mydata:genres ?genres ;
} LIMIT 100

PREFIX mydata: <http://example.com/resource/>
PREFIX spif: <http://spinrdf.org/spif#>
CONSTRUCT {
?m a mydata:Movie ;
mydata:hasGenre ?genre
} WHERE {
?m a mydata:Row ;
mydata:genres ?genres .
?genre spif:split (?genres "\\|")
} LIMIT 100
