General full-text search with the connectors

The GraphDB Connectors offer an excellent solution for indexing data with a well-known schema, e.g., index documents that have type A, where each document has a field F1 that can be reached by following the property chain composed of IRIs P1 and P2.

The features described below add a more general full-text search functionality to the connectors, and can be used individually or combined as desired to meet the specific needs of the use case.

Note

For a more general approach to full-text (FTS) search, GraphDB has been offering the Lucene FTS plugin, which allows indexing of arbitrary properties. However, the plugin has been deprecated from GraphDB version 9.7.x onwards and will be removed in a future version due to certain drawbacks such as no automatic synchronization when data is updated, as well as being over-engineered and unmaintained.

Useful connector features

The following connector features are useful when defining a connector for general full-text search:

Wildcard literal

This feature allows for indexing of literals without specifying the IRI of the predicate that leads to the literal. Use $literal as the last element of the property chain.

See more about wildcard literals in the Lucene connector.

Field names derived from the predicate

This feature allows for having dynamic field names derived from the IRI of the last predicate in the property chain.

See more about field name transformations in the Lucene connector.

Any type or untyped indexing

Specify $any or $untyped as the sole type to index all entities that have at least one RDF type, or all entities regardless of whether they have any RDF type.

See more about types in the Lucene connector.

Examples

All examples use the Star Wars RDF dataset. Download starwars-data.ttl and import it into a fresh repository before proceeding further.

Indexing all literals

To index all literals in the repository regardless of where they are attached in the graph, you can combine wildcard literal and untyped indexing. Create a connector such as:

PREFIX con: <http://www.ontotext.com/connectors/lucene#>
PREFIX con-inst: <http://www.ontotext.com/connectors/lucene/instance#>

INSERT DATA {
    con-inst:starwars_fts con:createConnector '''
    {
      "fields": [
        {
          "fieldName": "fts",
          "propertyChain": [
            "$literal"
          ],
          "facet": false
        }
      ],
      "languages": [
        ""
      ],
      "types": [
        "$untyped"
      ]
    }
''' .
}

The connector defines a single field, fts, that will index all literals regardless of their predicate: $literal as the last element of the property chain. The connector has no type expectations on the entities that lead to those literals and will index any entity regardless of whether it has an RDF type: $untyped in the types parameter.

Since the Star Wars dataset contains literals in many different languages, we restrict the index definition further by specifying "" (the empty language = any literal without a language tag) using the languages option.

We can now search in this connector as usual, for example for the FTS query “luke skywalker”:

# Full-text search for "skywalker"
PREFIX con: <http://www.ontotext.com/connectors/lucene#>
PREFIX con-inst: <http://www.ontotext.com/connectors/lucene/instance#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?entity ?label {
    [] a con-inst:starwars_fts ;
       con:query "skywalker" ;
       con:entities ?entity .
    ?entity rdfs:label ?label
    FILTER(lang(?label) = "")
}

We get many different results belonging to different types (showing only the first ten results):

SPARQL results for “skywalker”

?entity

?label

<https://swapi.co/resource/human/43>

Shmi Skywalker

<https://swapi.co/resource/human/1>

Luke Skywalker

<https://swapi.co/resource/human/35>

Padmé Amidala

<https://swapi.co/resource/planet/1>

Tatooine

<https://swapi.co/resource/human/10>

Obi-Wan Kenobi

<https://swapi.co/resource/human/11>

Anakin Skywalker

<https://swapi.co/resource/droid/2>

C-3PO

<https://swapi.co/resource/human/4>

Darth Vader

<https://swapi.co/resource/droid/3>

R2-D2

<https://swapi.co/resource/human/18>

Wedge Antilles

Indexing all literals in distinct fields

The above example indexes all literals into a single field, which is convenient for very rough full-text search. It can be fine-tuned by using field names derived from the predicate. In this example, we added "fieldNameTransform": "predicate.localName" so we will get a field for every predicate whose object literal is indexed, and the field name will be derived from the local name of the predicate:

PREFIX con: <http://www.ontotext.com/connectors/lucene#>
PREFIX con-inst: <http://www.ontotext.com/connectors/lucene/instance#>

INSERT DATA {
    con-inst:starwars_fts2 con:createConnector '''
    {
      "fields": [
        {
          "fieldName": "fts",
          "fieldNameTransform": "predicate.localName",
          "propertyChain": [
            "$literal"
          ],
          "facet": false
        }
      ],
      "languages": [
        ""
      ],
      "types": [
        "$untyped"
      ]
    }
''' .
}

We can use this connector to do general full-text searches, but also more precise ones, such as a query only in the label of entities (the field label is the result of taking the local name of <http://www.w3.org/2000/01/rdf-schema#label> at indexing time):

# Full-text search for "skywalker" in the field "label"
PREFIX con: <http://www.ontotext.com/connectors/lucene#>
PREFIX con-inst: <http://www.ontotext.com/connectors/lucene/instance#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?entity ?label {
    [] a con-inst:starwars_fts2 ;
       con:query "label:skywalker" ;
       con:entities ?entity .
    ?entity rdfs:label ?label
    FILTER(lang(?label) = "")
}

We get only three results back, namely the people that have “Skywalker” in their name:

SPARQL results for “skywalker” in the field “label”

?entity

?label

<https://swapi.co/resource/human/43>

Shmi Skywalker

<https://swapi.co/resource/human/1>

Luke Skywalker

<https://swapi.co/resource/human/11>

Anakin Skywalker