Loading data using OntoRefine

OntoRefine - overview and features

GraphDB OntoRefine is a data transformation tool, based on OpenRefine and integrated in the GraphDB Workbench. It can be used for converting tabular data into RDF and importing it into a GraphDB repository, using simple SPARQL queries and a virtual endpoint. The supported formats are TSV, CSV, *SV, XLS, XLSX, JSON, XML, RDF as XML, and Google sheet. Using OntoRefine, you can easily filter your data, edit its inconsistencies, convert it into RDF, and import it into a repository.

OntoRefine enables you to:

  • Upload your data file(s) and create a project.
  • View the cleaned data as RDF.
  • Transform your data using SPIN functions.
  • Import the newly created RDF directly in a GraphDB repository by using the GraphDB SPARQL endpoint.

Upload data in OntoRefine

Open OntoRefine in the Workbench

To transform your data into RDF, you need a working GraphDB database.

  1. Start GraphDB in a workbench mode.

    graphdb
    
  2. Open http://localhost:7200/ in a browser.

  3. Go to Import -> Tabular (OntoRefine).

All data files in OntoRefine are organized as projects. One project can have more than one data file.

The Create Project action area consists of three tabs corresponding to the source of data. You can upload a file from your computer, specify the URL of a publicly accessible data, or paste data from the clipboard.

Create a project

  1. Click Create Project -> Get data from.

  2. Select one or more files to upload:

    • from your computer

      _images/ontoRefine-upload-file.png
    • from web addresses (URLs)

      _images/ontoRefine-upload-file-url.png
    • from clipboard

      _images/ontoRefine-upload-file-clipboard.png
  3. Click Next.

  4. (Optional) Change the table configurations and update the preview.

    With the first opening of the file, OntoRefine tries to recognize the encoding of the text file and all delimiters.

    _images/ontoRefine-table-configurations.png
  5. Click Create Project.

    The result is table similar to that of an Excel or a Google sheet.

Import a project

To import an already existing OntoRefine project:

  1. Go to Import Project.

  2. Choose a file (.tar or .tar.gz)

  3. Import it.

    _images/ontoRefine-import-project.png

Open a project

Once the project is created

  1. Go to Open Project.
  2. Click the one you want to work on.
  3. (Optional) Rename your project or delete it.
_images/ontoRefine-rename-delete.png

Viewing tabular data as RDF

Unlike OpenRefine that supports RDF for an input, OntoRefine also supports RDF as an output. To view your data as triples, click the OntoRefine SPARQL button. OpenRefine can view the same data both as rows and as records. See https://github.com/OpenRefine/OpenRefine/wiki/Variables#record and http://kb.refinepro.com/2012/03/difference-between-record-and-row.html.

_images/ontoRefine-view-as-rdf.png

Warning

OntoRefine SPARQL button sends queries only to the currently open OntoRefine project and must not be mistaken with the GraphDB Workbench SPARQL tab, which is directly connected to the current repository.

Rows

The rows and columns from the table are mapped with the help of the following URIs:

  • urn:rowNNN - identifies a row, NNN is the 1-based row number;
  • urn:Row - the RDF type for each row;
  • urn:rowNumber - the property for row number, i.e. the same information as in urn:row_NNN_ but as an integer literal;
  • urn:col:MMM - properties for each named column, MMM is the name of the column;
  • urn:colNNN - properties for each named or unnamed column, NNN is the 1-based column number.

Note

Columns can be accessed both through the name-based urn:col:MMM and the number-based urn:colNNN properties.

Example

Refine table
_images/ontoRefine-raw-table.png
RDF data
# A basic SELECT that illustrates how to select the raw RDF by column names.
select * {
  ?presidentRow a <urn:Row> ;
        <urn:col:President> ?president ;
        <urn:col:Presidency> ?presidency ;
        <urn:col:Took_office> ?tookOffice ;
        <urn:col:Took_office> ?leftOffice ;
        <urn:col:Party> ?party ;
        <urn:col:Home_State> ?state .
}
_images/ontoRefine-raw-rdf.png

Records

If there are records, the following URIs will be used in addition to the row ones:

  • urn:recordNNN - identifies a record, NNN is the 1-based record number;
  • urn:Record - the RDF type for each record;
  • urn:recordNumber - the property for record number, i.e. the same information as in urn:record_NNN_ but as an integer literal;
  • urn:recordId - the property for record id (the value from the first column that creates an OpenRefine record);
  • urn:hasRow - the property that links records to rows.

Example

Refine table
_images/ontoRefine-record-table.png
RDF data
SELECT ?s ?p ?o
WHERE {
      ?s ?p ?o .
} LIMIT 50
_images/ontoRefine-record-view.png

RDFising data

You can transform your data in the OntoRefine SPARQL endpoint or directly in the GraphDB SPARQL endpoint using a CONSTRUCT query.

Using CONSTRUCT query in the OntoRefine SPARQL endpoint

GraphDB 8.O supports SPIN functions:

  • SPARQL functions for splitting a string
  • SPARQL functions for parsing dates
  • SPARQL functions for encoding URIs

Parsing dates and encoding URIs

prefix spif: <http://spinrdf.org/spif#>
prefix pres: <http://example.com/president/>
# A CONSTRUCT query that maps the raw RDF data from OntoRefine to user-specified
# RDF data (different IRIs, types, property names and dates as date-typed literals)
construct {
    ?presidentIRI a pres:President ;
        pres:tookOffice ?tookOfficeParsed ;
        pres:leftOffice ?leftOfficeParsed ;
        pres:nominatedBy ?party ;
        pres:homeState ?stateIRI
} where {
    ?presidentRow a <urn:Row> ;
        <urn:col:President> ?president ;
        <urn:col:Presidency> ?presidency ;
        <urn:col:Took_office> ?tookOffice ;
        <urn:col:Took_office> ?leftOffice ;
        <urn:col:Party> ?party ;
        <urn:col:Home_State> ?state .
    # Uses SPIN function to parse the dates
    bind(spif:parseDate(?tookOffice, "dd/MM/yyyy") as ?tookOfficeParsed)
    bind(spif:parseDate(?leftOffice, "dd/MM/yyyy") as ?leftOfficeParsed)
    # Uses several functions to construct IRIs for the presidents and their states
    bind(iri(concat("http://example.com/", spif:encodeURL(?president))) as ?presidentIRI)
    bind(iri(concat("http://example.com/", spif:encodeURL(?state))) as ?stateIRI)
}
_images/ontoRefine-parsing-dates.png

Splitting content in a cell

PREFIX spif: <http://spinrdf.org/spif#>
select ?m ?genres
where {
 ?m a <urn:Row> ;
     <urn:col:genres> ?genres .
}
_images/ontoRefine-splitting-cells1.png
PREFIX spif: <http://spinrdf.org/spif#>
construct {
  ?m a <urn:Movie> ;
    <urn:hasGenre> ?genre
} where {
  ?m a <urn:Row> ;
     <urn:col:genres> ?genres .
  ?genre spif:split (?genres "\\|")
}
_images/ontoRefine-splitting-cells2.png

Using CONSTRUCT query in the GraphDB SPARQL endpoint

The same queries can be executed in the GraphDB SPARQL endpoint. It is almost the same as executing the query in the OntoRefine SPARQL editor but you have to use SERVICE to connect to OntoRefine’s SPARQL endpoint.

  1. Click the link in the OntoRefine window to get the actual endpoint for your project.

    _images/ontoRefine-sparql-endpoint-permalink.png
  2. Go to GraphDB SPARQL menu.

  3. Execute the query.

    prefix spif: <http://spinrdf.org/spif#>
    prefix pres: <http://example.com/president/>
    # A CONSTRUCT query that maps the raw RDF data from OntoRefine to user-specified
    # RDF data (different IRIs, types, property names and dates as date-typed literals)
    construct {
       ?presidentIRI a pres:President ;
           pres:tookOffice ?tookOfficeParsed ;
           pres:leftOffice ?leftOfficeParsed ;
           pres:nominatedBy ?party ;
           pres:homeState ?stateIRI
     } where {
         # Uses SERVICE to fetch the raw RDF data from OntoRefine
         service <http://localhost:7200/rdf-bridge/1502952420990> {
             ?presidentRow a <urn:Row> ;
                 <urn:col:President> ?president ;
                 <urn:col:Presidency> ?presidency ;
                 <urn:col:Took_office> ?tookOffice ;
                 <urn:col:Took_office> ?leftOffice ;
                 <urn:col:Party> ?party ;
                 <urn:col:Home_State> ?state .
         }
         # Uses SPIN function to parse the dates
         bind(spif:parseDate(?tookOffice, "dd/MM/yyyy") as ?tookOfficeParsed)
         bind(spif:parseDate(?leftOffice, "dd/MM/yyyy") as ?leftOfficeParsed)
         # Uses several functions to construct IRIs for the presidents and their states
         bind(iri(concat("http://example.com/", spif:encodeURL(?president))) as ?presidentIRI)
         bind(iri(concat("http://example.com/", spif:encodeURL(?state))) as ?stateIRI)
      }
    
  4. The result will be:

    _images/graphdb-sparql-parsing-dates.png

Importing data in GraphDB

When you are satisfied with the transformation of your data, you can import it in the current repository without leaving the GraphDB Workbench.

  1. Click the link in the OntoRefine window to get the actual endpoint for your project.

    _images/ontoRefine-sparql-endpoint-permalink.png
  2. Go to GraphDB SPARQL menu.

  3. Execute the query to import the results.

    Hint

    The query is the same as the previous one. You only have to change CONSTRUCT to INSERT. Instead of showing the RDF, GraphDB will insert it into the current repository.

    prefix spif: <http://spinrdf.org/spif#>
    prefix pres: <http://example.com/president/>
    # An INSERT query that maps the raw RDF data from OntoRefine to user-specified
    # RDF data (different IRIs, types, property names and dates as date-typed literals)
    # and inserts the data into the current GraphDB repository.
    insert {
        ?presidentIRI a pres:President ;
            pres:tookOffice ?tookOfficeParsed ;
            pres:leftOffice ?leftOfficeParsed ;
            pres:nominatedBy ?party ;
            pres:homeState ?stateIRI
    } where {
        # Uses SERVICE to fetch the raw RDF data from OntoRefine
        service <http://localhost:8081/rdf-bridge/1738437518153> {
            ?presidentRow a <urn:Row> ;
                <urn:col:President> ?president ;
                <urn:col:Presidency> ?presidency ;
                <urn:col:Took_office> ?tookOffice ;
                <urn:col:Took_office> ?leftOffice ;
                <urn:col:Party> ?party ;
                <urn:col:Home_State> ?state .
        }
        # Uses SPIN function to parse the dates
        bind(spif:parseDate(?tookOffice, "dd/MM/yyyy") as ?tookOfficeParsed)
        bind(spif:parseDate(?leftOffice, "dd/MM/yyyy") as ?leftOfficeParsed)
        # Uses several functions to construct IRIs for the presidents and their states
        bind(iri(concat("http://example.com/", spif:encodeURL(?president))) as ?presidentIRI)
        bind(iri(concat("http://example.com/", spif:encodeURL(?state))) as ?stateIRI)
    }