Loading data using OntoRefine¶
What’s in this document?
OntoRefine - overview and features¶
GraphDB OntoRefine is a data transformation tool, based on OpenRefine and integrated in the GraphDB Workbench. It can be used for converting tabular data into RDF and importing it into a GraphDB repository, using simple SPARQL queries and a virtual endpoint. The supported formats are TSV, CSV, *SV, XLS, XLSX, JSON, XML, RDF as XML, and Google sheet. Using OntoRefine, you can easily filter your data, edit its inconsistencies, convert it into RDF, and import it into a repository.
OntoRefine enables you to:
- Upload your data file(s) and create a project.
- View the cleaned data as RDF.
- Transform your data using SPIN functions.
- Import the newly created RDF directly in a GraphDB repository by using the GraphDB SPARQL endpoint.
Example Data¶
The data used for the examples can be found at:
Upload data in OntoRefine¶
Open OntoRefine in the Workbench¶
To transform your data into RDF, you need a working GraphDB database.
Start GraphDB in a workbench mode.
graphdb
Open http://localhost:7200/ in a browser.
Go to Import -> Tabular (OntoRefine).
All data files in OntoRefine are organized as projects. One project can have more than one data file.
The Create Project action area consists of three tabs corresponding to the source of data. You can upload a file from your computer, specify the URL of a publicly accessible data, or paste data from the clipboard.
Create a project¶
Click Create Project -> Get data from.
Select one or more files to upload:
from your computer
from web addresses (URLs)
from clipboard
Click Next.
(Optional) Change the table configurations and update the preview.
With the first opening of the file, OntoRefine tries to recognize the encoding of the text file and all delimiters.
Click Create Project.
The result is table similar to that of an Excel or a Google sheet.
Import a project¶
To import an already existing OntoRefine project:
Go to Import Project.
Choose a file (.tar or .tar.gz)
Import it.
Open a project¶
Once the project is created
- Go to Open Project.
- Click the one you want to work on.
- (Optional) Rename your project or delete it.

Viewing tabular data as RDF¶
Unlike OpenRefine that supports RDF for an input, OntoRefine also supports RDF as an output. To view your data as triples, click the OntoRefine SPARQL button. OpenRefine can view the same data both as rows and as records. See https://github.com/OpenRefine/OpenRefine/wiki/Variables#record and http://kb.refinepro.com/2012/03/difference-between-record-and-row.html.

Warning
OntoRefine SPARQL button sends queries only to the currently open OntoRefine project and must not be mistaken with the GraphDB Workbench SPARQL tab, which is directly connected to the current repository.
Rows¶
The rows and columns from the table are mapped with the help of the following URIs:
urn:rowNNN
- identifies a row, NNN is the 1-based row number;urn:Row
- the RDF type for each row;urn:rowNumber
- the property for row number, i.e. the same information as inurn:row_NNN_
but as an integer literal;urn:col:MMM
- properties for each named column, MMM is the name of the column;urn:colNNN
- properties for each named or unnamed column, NNN is the 1-based column number.
Note
Columns can be accessed both through the name-based urn:col:MMM
and the number-based urn:colNNN
properties.
Example¶
Refine table¶

RDF data¶
# A basic SELECT that illustrates how to select the raw RDF by column names.
select * {
?presidentRow a <urn:Row> ;
<urn:col:President> ?president ;
<urn:col:Presidency> ?presidency ;
<urn:col:Took_office> ?tookOffice ;
<urn:col:Left_office> ?leftOffice ;
<urn:col:Party> ?party ;
<urn:col:Home_State> ?state .
}

Records¶
If there are records, the following URIs will be used in addition to the row ones:
urn:recordNNN
- identifies a record, NNN is the 1-based record number;urn:Record
- the RDF type for each record;urn:recordNumber
- the property for record number, i.e. the same information as inurn:record_NNN_
but as an integer literal;urn:recordId
- the property for record id (the value from the first column that creates an OpenRefine record);urn:hasRow
- the property that links records to rows.
RDFising data¶
You can transform your data in the OntoRefine SPARQL endpoint or directly in the GraphDB SPARQL endpoint using a CONSTRUCT query.
Using CONSTRUCT query in the OntoRefine SPARQL endpoint¶
GraphDB 8.O supports SPIN functions:
- SPARQL functions for splitting a string
- SPARQL functions for parsing dates
- SPARQL functions for encoding URIs
Parsing dates and encoding URIs¶
prefix spif: <http://spinrdf.org/spif#>
prefix pres: <http://example.com/president/>
# A CONSTRUCT query that maps the raw RDF data from OntoRefine to user-specified
# RDF data (different IRIs, types, property names and dates as date-typed literals)
construct {
?presidentIRI a pres:President ;
pres:tookOffice ?tookOfficeParsed ;
pres:leftOffice ?leftOfficeParsed ;
pres:nominatedBy ?party ;
pres:homeState ?stateIRI
} where {
?presidentRow a <urn:Row> ;
<urn:col:President> ?president ;
<urn:col:Presidency> ?presidency ;
<urn:col:Took_office> ?tookOffice ;
<urn:col:Left_office> ?leftOffice ;
<urn:col:Party> ?party ;
<urn:col:Home_State> ?state .
# Uses SPIN function to parse the dates
bind(spif:parseDate(?tookOffice, "dd/MM/yyyy") as ?tookOfficeParsed)
bind(spif:parseDate(?leftOffice, "dd/MM/yyyy") as ?leftOfficeParsed)
# Uses several functions to construct IRIs for the presidents and their states
bind(iri(concat("http://example.com/", spif:encodeURL(?president))) as ?presidentIRI)
bind(iri(concat("http://example.com/", spif:encodeURL(?state))) as ?stateIRI)
}

Splitting content in a cell¶
PREFIX spif: <http://spinrdf.org/spif#>
select ?m ?genres
where {
?m a <urn:Row> ;
<urn:col:genres> ?genres .
}

PREFIX spif: <http://spinrdf.org/spif#>
construct {
?m a <urn:Movie> ;
<urn:hasGenre> ?genre
} where {
?m a <urn:Row> ;
<urn:col:genres> ?genres .
?genre spif:split (?genres "\\|")
}

Using CONSTRUCT query in the GraphDB SPARQL endpoint¶
The same queries can be executed in the GraphDB SPARQL endpoint. It is almost the same as executing the query in the OntoRefine SPARQL editor but you have to use SERVICE
to connect to OntoRefine’s SPARQL endpoint.
Click the link in the OntoRefine window to get the actual endpoint for your project.
Go to GraphDB SPARQL menu.
Execute the query.
prefix spif: <http://spinrdf.org/spif#> prefix pres: <http://example.com/president/> # A CONSTRUCT query that maps the raw RDF data from OntoRefine to user-specified # RDF data (different IRIs, types, property names and dates as date-typed literals) construct { ?presidentIRI a pres:President ; pres:tookOffice ?tookOfficeParsed ; pres:leftOffice ?leftOfficeParsed ; pres:nominatedBy ?party ; pres:homeState ?stateIRI } where { # Uses SERVICE to fetch the raw RDF data from OntoRefine service <http://localhost:7200/rdf-bridge/1502952420990> { ?presidentRow a <urn:Row> ; <urn:col:President> ?president ; <urn:col:Presidency> ?presidency ; <urn:col:Took_office> ?tookOffice ; <urn:col:Left_office> ?leftOffice ; <urn:col:Party> ?party ; <urn:col:Home_State> ?state . } # Uses SPIN function to parse the dates bind(spif:parseDate(?tookOffice, "dd/MM/yyyy") as ?tookOfficeParsed) bind(spif:parseDate(?leftOffice, "dd/MM/yyyy") as ?leftOfficeParsed) # Uses several functions to construct IRIs for the presidents and their states bind(iri(concat("http://example.com/", spif:encodeURL(?president))) as ?presidentIRI) bind(iri(concat("http://example.com/", spif:encodeURL(?state))) as ?stateIRI) }
The result will be:
Importing data in GraphDB¶
When you are satisfied with the transformation of your data, you can import it in the current repository without leaving the GraphDB Workbench.
Click the link in the OntoRefine window to get the actual endpoint for your project.
Go to GraphDB SPARQL menu.
Execute the query to import the results.
Hint
The query is the same as the previous one. You only have to change
CONSTRUCT
toINSERT
. Instead of showing the RDF, GraphDB will insert it into the current repository.prefix spif: <http://spinrdf.org/spif#> prefix pres: <http://example.com/president/> # An INSERT query that maps the raw RDF data from OntoRefine to user-specified # RDF data (different IRIs, types, property names and dates as date-typed literals) # and inserts the data into the current GraphDB repository. insert { ?presidentIRI a pres:President ; pres:tookOffice ?tookOfficeParsed ; pres:leftOffice ?leftOfficeParsed ; pres:nominatedBy ?party ; pres:homeState ?stateIRI } where { # Uses SERVICE to fetch the raw RDF data from OntoRefine service <http://localhost:8081/rdf-bridge/1738437518153> { ?presidentRow a <urn:Row> ; <urn:col:President> ?president ; <urn:col:Presidency> ?presidency ; <urn:col:Took_office> ?tookOffice ; <urn:col:Left_office> ?leftOffice ; <urn:col:Party> ?party ; <urn:col:Home_State> ?state . } # Uses SPIN function to parse the dates bind(spif:parseDate(?tookOffice, "dd/MM/yyyy") as ?tookOfficeParsed) bind(spif:parseDate(?leftOffice, "dd/MM/yyyy") as ?leftOfficeParsed) # Uses several functions to construct IRIs for the presidents and their states bind(iri(concat("http://example.com/", spif:encodeURL(?president))) as ?presidentIRI) bind(iri(concat("http://example.com/", spif:encodeURL(?state))) as ?stateIRI) }