## OntoRefine - overview and features¶

GraphDB OntoRefine is a data transformation tool, based on OpenRefine and integrated in the GraphDB Workbench. It can be used for converting tabular data into RDF and importing it into a GraphDB repository, using simple SPARQL queries and a virtual endpoint. The supported formats are TSV, CSV, *SV, XLS, XLSX, JSON, XML, RDF as XML, and Google sheet. Using OntoRefine, you can easily filter your data, edit its inconsistencies, convert it into RDF, and import it into a repository.

OntoRefine enables you to:

• View the cleaned data as RDF.

• Transform your data using SPIN functions.

• Import the newly created RDF directly in a GraphDB repository by using the GraphDB SPARQL endpoint.

## Example Data¶

The data used for the examples can be found at:

### Open OntoRefine in the Workbench¶

To transform your data into RDF, you need a working GraphDB database.

1. Start GraphDB in Workbench mode.

2. Open http://localhost:7200/ in a browser.

3. Go to Import -> Tabular (OntoRefine).

All data files in OntoRefine are organized as projects. One project can have more than one data file.

The Create Project action area consists of three tabs corresponding to the source of data. You can upload a file from your computer, specify the URL of a publicly accessible data, or paste data from the clipboard.

### Create a project¶

1. Click Create Project -> Get data from.

2. Select one or more files to upload:

• from clipboard

3. Click Next.

4. (Optional) Change the table configurations and update the preview.

With the first opening of the file, OntoRefine tries to recognize the encoding of the text file and all delimiters.

5. Click Create Project.

The result is a table similar to that of an Excel or a Google sheet.

### Import a project¶

To import an already existing OntoRefine project:

1. Go to Import Project.

2. Select a file (.tar or .tar.gz)

3. Import it.

### Open a project¶

Once the project is created:

1. Go to Open Project.

2. Click the one you want to work on.

3. (Optional) You can also delete your project if you want to.

## Viewing tabular data as RDF¶

Unlike OpenRefine that supports RDF for an input, OntoRefine also supports RDF as an output. To view your data in triples, click the OntoRefine RDF button. A settings screen appears to configure the RDF data that OntoRefine helps you generate.

You can change the IRIs generated for your data. Leaving the defaults will return the following query; click Run to obtain the results.

Warning

The OntoRefine SPARQL button sends queries only to the currently open OntoRefine project. It should not be mistaken with the GraphDB Workbench SPARQL tab, which is directly connected to the current repository.

## RDFizing data¶

You can transform your data in the OntoRefine SPARQL endpoint using a CONSTRUCT query.

### Using CONSTRUCT query in the OntoRefine SPARQL endpoint¶

Construct new RDF data based on the RDFized model of your tabular data. Use SPIN functions to construct proper triples.

GraphDB 8.O supports SPIN functions:

• SPARQL functions for splitting a string

• SPARQL functions for parsing dates

• SPARQL functions for encoding URIs

#### Parsing dates and encoding URIs¶

PREFIX mydata: <http://example.com/resource/>
PREFIX spif: <http://spinrdf.org/spif#>

# Example query returning RDF data
CONSTRUCT  {
?presidentIRI a mydata:President ;
mydata:tookOffice ?tookOfficeParsed ;
mydata:leftOffice ?leftOfficeParsed ;
mydata:nominatedBy ?party ;
mydata:homeState ?stateIRI
} WHERE {
# Triple patterns for accessing each row and the columns in contains
# Note that no triples will be generated for NULL values in the table
# You should inspect your data in Refine mode and add OPTIONAL accordingly
?row a mydata:Row ;
mydata:Presidency ?Presidency ;
mydata:President ?President ;
mydata:Wikipedia_Entry ?Wikipedia_Entry ;
mydata:Took_office ?Took_office ;
mydata:Left_office ?Left_office ;
mydata:Party ?Party ;
mydata:Portrait ?Portrait ;
mydata:Thumbnail ?Thumbnail ;
mydata:Home_State ?Home_State .

# Uses SPIN function to parse the dates
bind(spif:parseDate(?Took_office, "dd/MM/yyyy") as ?tookOfficeParsed)
bind(spif:parseDate(?Left_office, "dd/MM/yyyy") as ?leftOfficeParsed)
# Uses several functions to construct IRIs for the presidents and their states
bind(iri(concat("http://example.com/", spif:encodeURL(?President))) as ?presidentIRI)
bind(iri(concat("http://example.com/", spif:encodeURL(?Home_State))) as ?stateIRI)
} LIMIT 100


The rows and columns from the table are mapped with the help of the following URIs:

• mydata:Row - the RDF type for each row;

• mydata:rowNumber - the property for row number as an integer literal;

• mydata:<column_name> - properties for each column

## Importing data in GraphDB¶

Once you are satisfied with the transformation of your data, you can import it in the current repository without leaving the GraphDB Workbench.

1. Click Data -> Open in main SPARQL editor

2. The query is the same as the previous one, only with the addition of a SERVICE clause. You only have to change CONSTRUCT to INSERT and remove the LIMIT. Instead of showing the RDF, GraphDB will insert it into the current repository.

3. Execute the query to import the results.

## Rows and records¶

OpenRefine can view the same data both as rows and as records. See https://github.com/OpenRefine/OpenRefine/wiki/Variables#record and http://kb.refinepro.com/2012/03/difference-between-record-and-row.html.

### Records¶

For the purpose of the demo, we can create records for each party. Go to Party column -> Edit column -> Move column to the beginning. By clicking Party column -> Text facet, we can see that some values contain spaces.

1. Clean them by Party column -> Edit cells -> Transform and type value.trim(). Observe how the text facet is updated.

2. Now click Party column -> Sort -> OK. Just use the defaults.

3. Then Sort -> Reorder rows permanently

4. And Party column -> Edit cells -> Blank down. Click on Records. The result looks like this:

5. Now, the RDF button leads to the following query and results:

When there are records, the following URIs will be used in addition to the row ones:

• mydata:recordNNN - identifies a record, NNN is the 1-based record number;

• mydata:Record - the RDF type for each record;

• mydata:recordId - the property for record id (the value from the first column that creates an OpenRefine record);

• mydata:hasRow - the property that links records to rows.

#### Splitting content in a cell¶

Load the movie dataset for this example. Sometimes you may have multiple values in the same cell. You can easily split them and generate separate URIs using SPIF functions.

PREFIX mydata: <http://example.com/resource/>
PREFIX spif: <http://spinrdf.org/spif#>

SELECT * {
?row a mydata:Row ;
mydata:genres ?genres ;
} LIMIT 100

PREFIX mydata: <http://example.com/resource/>
PREFIX spif: <http://spinrdf.org/spif#>

CONSTRUCT {
?m a mydata:Movie ;
mydata:hasGenre ?genre
} WHERE {
?m a mydata:Row ;
mydata:genres ?genres .
?genre spif:split (?genres "\\|")
} LIMIT 100