# Solr GraphDB Connector¶

Note

This feature requires a GraphDB Enterprise license.

## Overview and features¶

The GraphDB Connectors provide extremely fast normal and faceted (aggregation) searches, typically implemented by an external component or a service such as Solr but have the additional benefit of staying automatically up-to-date with the GraphDB repository data.

Note

GraphDB supports full-text search options as well.

The Connectors provide synchronization at the entity level, where an entity is defined as having a unique identifier (a IRI) and a set of properties and property values. In terms of RDF, this corresponds to a set of triples that have the same subject. In addition to simple properties (defined by a single triple), the Connectors support property chains. A property chain is defined as a sequence of triples where each triple’s object is the subject of the following triple.

The main features of the GraphDB Connectors are:

• maintaining an index that is always in sync with the data stored in GraphDB;

• multiple independent instances per repository;

• the entities for synchronization are defined by:

• a list of fields (on the Solr side) and property chains (on the GraphDB side) whose values will be synchronized;

• a list of rdf:type’s of the entities for synchronization;

• a list of languages for synchronization (the default is all languages);

• additional filtering by property and value.

• full-text search using native Solr queries;

• snippet extraction: highlighting of search terms in the search result;

• faceted search;

• sorting by any preconfigured field;

• paging of results using offset and limit;

• custom mapping of RDF types to Solr types;

Each feature is described in detail below.

## Usage¶

All interactions with the Solr GraphDB Connector shall be done through SPARQL queries.

There are three types of SPARQL queries:

• INSERT for creating, updating, and deleting connector instances;

• SELECT for listing connector instances and querying their configuration parameters;

• INSERT/SELECT for storing and querying data as part of the normal GraphDB data workflow.

In general, this corresponds to INSERT that adds or modifies data, and to SELECT that queries existing data.

Each connector implementation defines its own IRI prefix to distinguish it from other connectors. For the Solr GraphDB Connector, this is http://www.ontotext.com/connectors/solr#. Each command or predicate executed by the connector uses this prefix, e.g., http://www.ontotext.com/connectors/solr#createConnector to create a connector instance for Solr.

Individual instances of a connector are distinguished by unique names that are also IRIs. They have their own prefix to avoid clashing with any of the command predicates. For Solr, the instance prefix is http://www.ontotext.com/connectors/solr/instance#.

Sample data

All examples use the following sample data that describes five fictitious wines: Yoyowine, Franvino, Noirette, Blanquito and Rozova as well as the grape varieties required to make these wines. The minimum required ruleset level in GraphDB is RDFS.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix wine: <http://www.ontotext.com/example/wine#> .

wine:RedWine rdfs:subClassOf wine:Wine .
wine:WhiteWine rdfs:subClassOf wine:Wine .
wine:RoseWine rdfs:subClassOf wine:Wine .

wine:Merlo
rdf:type wine:Grape ;
rdfs:label "Merlo" .

wine:CabernetSauvignon
rdf:type wine:Grape ;
rdfs:label "Cabernet Sauvignon" .

wine:CabernetFranc
rdf:type wine:Grape ;
rdfs:label "Cabernet Franc" .

wine:PinotNoir
rdf:type wine:Grape ;
rdfs:label "Pinot Noir" .

wine:Chardonnay
rdf:type wine:Grape ;
rdfs:label "Chardonnay" .

wine:Yoyowine
rdf:type wine:RedWine ;
wine:hasSugar "dry" ;
wine:hasYear "2013"^^xsd:integer .

wine:Franvino
rdf:type wine:RedWine ;
wine:hasSugar "dry" ;
wine:hasYear "2012"^^xsd:integer .

wine:Noirette
rdf:type wine:RedWine ;
wine:hasSugar "medium" ;
wine:hasYear "2012"^^xsd:integer .

wine:Blanquito
rdf:type wine:WhiteWine ;
wine:hasSugar "dry" ;
wine:hasYear "2012"^^xsd:integer .

wine:Rozova
rdf:type wine:RoseWine ;
wine:hasSugar "medium" ;
wine:hasYear "2013"^^xsd:integer .


## Setup and maintenance¶

### Prerequisites¶

Solr core creation

To create new Solr cores on the fly, you have to use the custom admin handler provided with the Solr Connector.

1. Copy the solr-core-admin-handler.jar file from the /tools to the /configs/solr-home/ directory of the GraphDB distribution.

2. To start Solr, execute:

<path-to-solr-distribution>/bin/solr start -p 8934 -s /<path-to-solr-home>

Solr schema setup

To use the connector, the core’s schema from which the configuration will be copied (most of the time named collection1) must be configured to allow schema modifications. See “Managed Schema Definition in SolrConfig” on page 409 of the Apache Solr Reference Guide.

A good starting point is the configuration from example-schemaless in the Solr distribution.

Third-party component versions

This version of the Solr GraphDB Connector uses Solr version 8.11.1.

### Creating a connector instance¶

Creating a connector instance is done by sending a SPARQL query with the following configuration data:

• the name of the connector instance (e.g., my_index);

• a Solr instance to synchronize to;

• classes to synchronize;

• properties to synchronize.

The configuration data has to be provided as a JSON string representation and passed together with the create command.

You can create connectors via a Workbench dialog or by using a SPARQL update query (create command).

If you create the connector via the Workbench, no matter which way you use, you will be presented with a pop-up screen showing you the connector creation progress.

#### Using the Workbench¶

1. Go to Setup ‣ Connectors.

2. Click New Connector in the tab of the respective Connector type you want to create.

3. Fill in the configuration form.

1. Execute the CREATE statement from the form by clicking OK. Alternatively, you can view its SPARQL query by clicking View SPARQL Query, and then copy it to execute it manually or integrate it in automation scripts.

#### Using the create command¶

The create command is triggered by a SPARQL INSERT with the createConnector predicate, e.g., it creates a connector instance called my_index, which synchronizes the wines from the sample data above.

To be able to use newlines and quotes without the need for escaping, here we use SPARQL’s multi-line string delimiter consisting of 3 apostrophes: '''...'''. You can also use 3 quotes instead: """...""".

PREFIX solr: <http://www.ontotext.com/connectors/solr#>
PREFIX solr-index: <http://www.ontotext.com/connectors/solr/instance#>

INSERT DATA {
solr-index:my_index solr:createConnector '''
{
"solrUrl": "http://localhost:8983/solr",
"types": [
"http://www.ontotext.com/example/wine#Wine"
],
"fields": [
{
"fieldName": "grape",
"propertyChain": [
"http://www.w3.org/2000/01/rdf-schema#label"
]
},
{
"fieldName": "sugar",
"propertyChain": [
"http://www.ontotext.com/example/wine#hasSugar"
],
"analyzed": false,
"multivalued": false
},
{
"fieldName": "year",
"propertyChain": [
"http://www.ontotext.com/example/wine#hasYear"
],
"analyzed": false
}
]
}
''' .
}


Note

One of the fields has "multivalued": false. This is explained further under Sorting.

The above command creates a new Solr connector instance that connects to the Solr instance accessible at port 8983 on the localhost as specified by the "solrUrl" key.

The "types" key defines the RDF type of the entities to synchronize and, in the example, it is only entities of the type http://www.ontotext.com/example/wine#Wine (and its subtypes if RDFS or higher-level reasoning is enabled). The "fields" key defines the mapping from RDF to Solr. The basic building block is the property chain, i.e., a sequence of RDF properties where the object of each property is the subject of the following property. In the example, three bits of information are mapped - the grape the wines are made of, sugar content, and year. Each chain is assigned a short and convenient field name: “grape”, “sugar”, and “year”. The field names are later used in the queries.

The field grape is an example of a property chain composed of more than one property. First, we take the wine’s madeFromGrape property, the object of which is an instance of the type Grape, and then we take the rdfs:label of this instance. The fields sugar and year are both composed of a single property that links the value directly to the wine.

The fields sugar and year contain discrete values, such as medium, dry, 2012, 2013, and thus it is best to specify the option analyzed: false as well. See analyzed in Defining fields for more information.

#### Schema and core management¶

By default, GraphDB manages (create, delete or update if needed) the Solr core and the Solr schema. This makes it easier to use Solr as everything is done automatically. This behavior can be changed by the following options:

• manageCore: if true, GraphDB manages the core. true by default.

• manageSchema: if true, GraphDB manages the schema. true by default.

The automatic core management requires the custom Solr admin handler provided with the GraphDB distribution. For more information, see Solr core creation.

Note

If either of the options is set to false, you have to create, update or remove the core/schema manually and, in case Solr is misconfigured, the connector instance will not function correctly.

#### Using a non-managed schema¶

The present version provides no support for changing some advanced options, such as stop words, on a per-field basis. The recommended way to do this for now is to manage the schema yourself and tell the connector to just sync the object values in the appropriate fields. Here is an example:

PREFIX solr: <http://www.ontotext.com/connectors/solr#>
PREFIX solr-index: <http://www.ontotext.com/connectors/solr/instance#>

INSERT DATA {
solr-index:my_index solr:createConnector '''
{
"solrUrl": "http://localhost:8983/solr",
"types": [
"http://www.ontotext.com/example/wine#Wine"
],
"fields": [
{
"fieldName": "grape",
"propertyChain": [
"http://www.w3.org/2000/01/rdf-schema#label"
]
},
{
"fieldName": "sugar",
"propertyChain": [
"http://www.ontotext.com/example/wine#hasSugar"
],
"analyzed": false,
"multivalued": false
},
{
"fieldName": "year",
"propertyChain": [
"http://www.ontotext.com/example/wine#hasYear"
],
"analyzed": false
}
],
"manageSchema": "false"
}
''' .
}


This creates the same connector instance as above but it expects fields with the specified field names to be already present in the core as well as some internal GraphDB fields. For the example, you must have the following fields:

Field name

Solr config

_graphdb_id

<field name="_graphdb_id" type="string" indexed="true" stored="true" required="true" multiValued="false"/>

grape

<field name="grape" type="text_general" indexed="true" stored="true" multiValued="true"/>

sugar

<field name="sugar" type="text_general" indexed="true" stored="true" multiValued="false"/>

year

<field name="year" type="tints" indexed="true" stored="true" multiValued="true"/>

_graphdb_id is used internally by GraphDB and is always required.

#### Working with secured Solr¶

GraphDB allows the access of a secured Solr instance by passing the arbitrary parameters.

To setup basic user authentication configuration in GraphDB Solr Connector, you need to configure the solrBasicAuthUser and solrBasicAuthPassword parameters.

...
solr-index:my_index conn:createConnector '''
{
"hasProperty": "http://www.w3.org/2000/01/rdf-schema#comment",

Use the pseudo-IRI $untyped to sync entities regardless of whether they have any RDF type, see also the examples in General full-text search with the connectors. languages (list of strings), optional, valid languages for literals RDF data is often multilingual but you can map only some of the languages represented in the literal values. This can be done by specifying a list of language ranges to be matched to the language tags of literals according to RFC 4647, Section 3.3.1. Basic Filtering. In addition, an empty range can be used to include literals that have no language tag. The list of language ranges maps all existing literals that have matching language tags. fields (list of field objects), required, defines the mapping from RDF to Solr The fields define exactly what parts of each entity will be synchronized as well as the specific details on the connector side. The field is the smallest synchronization unit and it maps a property chain from GraphDB to a field in Solr. The fields are specified as a list of field objects. At least one field object is required. Each field object has further keys that specify details. • fieldName (string), required, the name of the field in Solr The name of the field defines the mapping on the connector side. It is specified by the key fieldName with a string value. The field name is used at query time to refer to the field. There are few restrictions on the allowed characters in a field name but to avoid unnecessary escaping (which depends on how Solr parses its queries), we recommend to keep the field names simple. • fieldNameTransform (one of none, predicate or predicate.localName), optional, none by default Defines an optional transformation of the field name. Although fieldName is always required, it is ignored if fieldNameTransform is predicate or predicate.localName. • none: The field name is supplied via the fieldName option. • predicate: The field name is equal to the full IRI of the last predicate of the chain, e.g., if the last predicate was http://www.w3.org/2000/01/rdf-schema#label, then the field name will be http://www.w3.org/2000/01/rdf-schema#label too. • predicate.localName: The field name is the derived from the local name of the IRI of the last predicate of the chain, e.g., if the last predicate was http://www.w3.org/2000/01/rdf-schema#comment, then the field name will be comment. See Indexing all literals in distinct fields for an example. • propertyChain (list of IRIs), required, defines the property chain to reach the value The property chain (propertyChain) defines the mapping on the GraphDB side. A property chain is defined as a sequence of triples where the entity IRI is the subject of the first triple, its object is the subject of the next triple and so on. In this model, a property chain with a single element corresponds to a direct property defined by a single triple. Property chains are specified as a list of IRIs where at least one IRI must be provided. The IRI of the document will be synchronized to the special field "id" in Solr. You may use it to query Solr directly and retrieve the matching entity IRI. See Copy fields for defining multiple fields with the same property chain. See Multiple property chains per field for defining a field whose values are populated from more than one property chain. See Indexing language tags for defining a field whose values are populated with the language tags of literals. See Indexing the IRI of an entity for defining a field whose values are populated with the IRI of the indexed entity. See Wildcard literal indexing for defining a field whose values are populated with literals regardless of their predicate. • valueFilter (string), optional, specifies the value filter for the field See also Entity filtering. • defaultValue (string), optional, specifies a default value for the field The default value (defaultValue) provides means for specifying a default value for the field when the property chain has no matching values in GraphDB. The default value can be a plain literal, a literal with a datatype (xsd: prefix supported), a literal with language, or a IRI. It has no default value. • indexed (boolean), optional, default true If indexed, a field is available for Solr queries. True by default. This options corresponds to the property "indexed" in the Solr schema. • stored (boolean), optional, default true Fields can be stored in Solr and this is controlled by the Boolean option "stored". Stored fields are required for retrieving snippets. True by default. This option corresponds to the property "stored" in the Solr schema. • analyzed (boolean), optional, default true When literal fields are indexed in Solr, they will be analysed according to the analyser settings. Should you require that a given field is not analysed, you may use "analyzed". This option has no effect for IRIs (they are never analysed). True by default. This option affects the Solr type that is used for the field. True uses a type suitable for the values (i.e., text or numeric), while false uses the type "string", which is never analysed by Solr. • multivalued (boolean), optional, default true RDF properties and synchronized fields may have more than one value. If "multivalued" is set to true, all values will be synchronized to Solr. If set to false, only a single value will be synchronized. True by default. This option corresponds to the "multiValued" property in the Solr schema. Note that Solr cannot order results by multivalued fields so you need to adjust your options accordingly. • ignoreInvalidValues (boolean), optional, default false Per-field option that controls what happens when a value cannot be converted to the requested (or previously detected) type. False by default. Example use: when an invalid date literal like "2021-02-29"^^xsd:date (2021 is not a leap year) needs to be indexed as a date, or when an IRI needs to be indexed as a number. Note that some conversions are always valid: any literal to an FTS field, any non-literal (IRI, blank node, embedded triple) to a non-analyzed field. When true, such values will be skipped with a note in the logs. When false, such values will break the transaction. • datatype (string), optional, the manual datatype override By default, the Solr GraphDB Connector uses datatype of literal values to determine how they must be mapped to Solr types. For more information on the supported datatypes, see Datatype mapping. The mapping can be overridden through the property “datatype”, which can be specified per field. The value of “datatype” can be any of the xsd: types supported by the automatic mapping or a native Solr type prefixed by native:, e.g., both xsd:long and native:tlongs map to the tlongs type in Solr. valueFilter (string), optional, specifies the top-level value filter for the document See also Entity filtering. documentFilter (string), optional, specifies the top-level document filter for the document See also Entity filtering. ### Updating parameters at runtime¶ As mentioned above, the following connector parameters can be updated at runtime without having to rebuild the index: • solrUrl • bulkUpdateBatchSize • solrBasicAuthUser • solrBasicAuthPassword This can be done by executing the following SPARQL update, here with examples for changing the user and password: PREFIX conn:<http://www.ontotext.com/connectors/solr#> PREFIX inst:<http://www.ontotext.com/connectors/solr/instance#> INSERT DATA { inst:properIndex conn:updateConnector ''' { "solrBasicAuthUser": "foo", "solrBasicAuthPassword": "bar" } '''. }  ### Special field definitions¶ #### Copy fields¶ Often, it is convenient to synchronize one and the same data multiple times with different settings to accommodate for different use cases, e.g., faceting or sorting vs full-text search. The Solr GraphDB Connector has explicit support for fields that copy their value from another field. This is achieved by specifying a single element in the property chain of the form @otherFieldName, where otherFieldName is another non-copy field. Take the following example: ... "fields": [ { "fieldName": "grape", "facet": false, "propertyChain": [ "http://www.ontotext.com/example/wine#madeFromGrape", "http://www.w3.org/2000/01/rdf-schema#label" ], "analyzed": true, }, { "fieldName": "grapeFacet", "propertyChain": [ "@grape" ], "analyzed": false, } ] ...  The snippet creates an analysed field “grape” and a non-analysed field “grapeFacet”, both fields are populated with the same values and “grapeFacet” is defined as a copy field that refers to the field “facet”. Note The connector handles copy fields in a more optimal way than specifying a field with exactly the same property chain as another field. #### Multiple property chains per field¶ Sometimes, you have to work with data models that define the same concept (in terms of what you want to index in Solr) with more than one property chain, e.g., the concept of “name” could be defined as a single canonical name, multiple historical names and some unofficial names. If you want to index these together as a single field in Solr you can define this as a multiple property chains field. Fields with multiple property chains are defined as a set of separate virtual fields that will be merged into a single physical field when indexed. Virtual fields are distinguished by the suffix $xyz, where xyz is any alphanumeric sequence of convenience. For example, we can define the fields name$1 and name$2 like this:

...
"fields": [
{
"fieldName": "name$1", "propertyChain": [ "http://www.ontotext.com/example#canonicalName" ], "fieldName": "name$2",
"propertyChain": [
"http://www.ontotext.com/example#historicalName"
]
...
},
...


The values of the fields name$1 and name$2 will be merged and synchronized to the field name in Solr.

Note

You cannot mix suffixed and unsuffixed fields with the same same, e.g., if you defined myField$new and myField$old you cannot have a field called just myField.

##### Filters and fields with multiple property chains¶

Filters can be used with fields defined with multiple property chains. Both the physical field values and the individual virtual field values are available:

• Physical fields are specified without the suffix, e.g., ?myField

• Virtual fields are specified with the suffix, e.g., ?myField$2 or ?myField$alt.

Note

Physical fields cannot be combined with parent() as their values come from different property chains. If you really need to filter the same parent level, you can rewrite parent(?myField) in (<urn:x>, <urn:y>) as parent(?myField$1) in (<urn:x>, <urn:y>) || parent(?myField$2) in (<urn:x>, <urn:y>) || parent(?myField$3) ... and surround it with parentheses if it is a part of a bigger expression. #### Indexing language tags¶ The language tag of an RDF literal can be indexed by specifying a property chain, where the last element is the pseudo-IRI lang(). The property preceding lang() must lead to a literal value. For example, PREFIX solr: <http://www.ontotext.com/connectors/solr#> PREFIX solr-index: <http://www.ontotext.com/connectors/solr/instance#> INSERT DATA { solr-index:my_index :createConnector ''' { "solrUrl": "http://localhost:8984/solr", "types": ["http://www.ontotext.com/example#gadget"], "fields": [ { "fieldName": "name", "propertyChain": [ "http://www.ontotext.com/example#name" ] }, { "fieldName": "nameLanguage", "propertyChain": [ "http://www.ontotext.com/example#name", "lang()" ] } ], } ''' . }  The above connector will index the language tag of each literal value of the property http://www.ontotext.com/example#name into the field nameLanguage. #### Indexing named graphs¶ The named graph of a given value can be indexed by ending a property chain with the special pseudo-URI graph(). Indexing the named graph of the value instead of the value itself allows searching by named graph. PREFIX solr: <http://www.ontotext.com/connectors/solr#> PREFIX solr-index: <http://www.ontotext.com/connectors/solr/instance#> INSERT DATA { solr-index:my_index :createConnector ''' { "solrUrl": "http://localhost:8983/solr", "types": ["http://www.ontotext.com/example#gadget"], "fields": [ { "fieldName": "name", "propertyChain": [ "http://www.ontotext.com/example#name" ] }, { "fieldName": "nameGraph", "propertyChain": [ "http://www.ontotext.com/example#name", "graph()" ] } ], } ''' . }  The above connector will index the named graph of each value of the property http://www.ontotext.com/example#name into the field nameGraph. #### Wildcard literal indexing¶ In this mode, the last element of a property chain is a wildcard that will match any predicate that leads to a literal value. Use the special pseudo-IRI $literal as the last element of the property chain to activate it.

Note

Currently, it really means any literal, including literals with data types.

For example:

{
"fields" : [ {
"propertyChain" : [ "$literal" ], "fieldName" : "name" }, { "propertyChain" : [ "http://example.com/description", "$literal" ],
"fieldName" : "description"
}
...
}


See Indexing all literals for a detailed example.

#### Indexing the IRI of an entity¶

Sometimes you may need the IRI of each entity (e.g., http://www.ontotext.com/example/wine#Franvino from our small example dataset) indexed as a regular field. This can be achieved by specifying a property chain with a single property referring to the pseudo-IRI $self. For example, PREFIX solr: <http://www.ontotext.com/connectors/solr#> PREFIX solr-index: <http://www.ontotext.com/connectors/solr/instance#> INSERT DATA { solr-index:my_index solr:createConnector ''' { "solrUrl": "http://localhost:8983/solr", "types": [ "http://www.ontotext.com/example/wine#Wine" ], "fields": [ { "fieldName": "entityId", "propertyChain": [ "$self"
],
},
{
"fieldName": "grape",
"propertyChain": [
"http://www.w3.org/2000/01/rdf-schema#label"
]
},
]
}
''' .
}


The above connector will index the IRI of each wine into the field entityId.

Note

Note that GraphDB will also use the IRI of each entity as the ID of each document in Solr, which is represented by the field id.

## Datatype mapping¶

The Solr GraphDB Connector maps different types of RDF values to different types of Solr values according to the basic type of the RDF value (IRI or literal) and the datatype of literals. The autodetection uses the following mapping:

RDF value

RDF datatype

Solr type

IRI

n/a

string

literal

any type not explicitly mentioned below

text_general

literal

with one of the language tags en, de, es, ru

text_xx where xx is language dependent

literal

xsd:boolean

boolean

literal

xsd:double

pdouble (single value), pdoubles (multivalued)

literal

xsd:float

pfloat (single value), pfloats (multivalued)

literal

xsd:long

plong (single value), plongs (multivalued)

literal

xsd:int

pint (single value), pints (multivalued)

literal

xsd:dateTime

pdate (single value), pdates (multivalued)

literal

xsd:date

pdate (single value), pdates (multivalued)

literal

xsd:gYear

pdate (single value), pdates (multivalued)

literal

xsd:gYearMonth

pdate (single value), pdates (multivalued)

The datatype mapping can be affected by the synchronization options, too. For example, a non-analysed field that has xsd:long values does not use plong or plongs but string instead.

Note

For any given field the automatic mapping uses the first value it sees. This works fine for clean datasets but might lead to problems, if your dataset has non-normalised data, e.g., the first value has no datatype but other values have.

It is therefore recommended to set datatype to a fixed value, e.g. xsd:date.

Please note that the commonly used xsd:integer and xsd:decimal datatypes are not indexed as numbers because they represent infinite precision numbers. You can override that by using the datatype option to cast to xsd:long, xsd:double, xsd:float as appropriate.

### Date and time conversion¶

RDF and Solr use slightly different models to represent dates and times, even though the values might look very similar.

Years in RDF values use the XSD format and are era years, where positive values denote the common era and negative values denote years before the common era. There is no year zero.

Years in Solr use the ISO format and are proleptic years, i.e., positive values denote years from the common era with any previous eras just going down by one mathematically so there is year zero.

In short:

• year 2020 CE = year 2020 in XSD = year 2020 in ISO.

• year 1 CE = year 1 in XSD = year 1 in ISO.

• year 1 BCE = year -1 in XSD = year 0 in ISO.

• year 2 BCE = year -2 in XSD = year -1 in ISO.

All years coming from RDF literals will be converted to ISO before indexing in Solr.

Both XSD and ISO date and time values support timezones. Solr requires all date and time values to be normalized to the UTC timezone, so the Solr connector will convert the values accordingly before sending them to Solr for indexing.

In addition to that, XSD defines the lack of a timezone as undetermined. Since we do not want to have any undetermined state in the indexing system, we define the undetermined time zone as UTC, i.e., "2020-02-14T12:00:00"^^xsd:dateTime is equivalent to "2020-02-14T12:00:00Z"^^xsd:dateTime (Z is the UTC time zone, also known as +00:00).

Also note that XSD dates and partial dates, e.g., xsd:gYear values, may have a timezone, which leads to additional complications. E.g., "2020+02:00"^^xsd:gYear (the year 2020 in the +02:00 timezone) will be normalized to 2019-12-31T22:00:00Z (the previous year!) if strict timezone adherence is followed. We have chosen to ignore the timezone on any values that do not have an associated time value, e.g.:

• "2020-02-15+02:00"^^xsd:date

• "2020-02+02:00"^^xsd:gYearMonth

• "2020+02:00"^^xsd:gYear

All of the above will be treated as if they specified UTC as their timezone.

## Entity filtering¶

The Solr connector supports three kinds of entity filters used to fine-tune the set of entities and/or individual values for the configured fields, based on the field value. Entities and field values are synchronized to Solr if, and only if, they pass the filter. The filters are similar to a FILTER() inside a SPARQL query but not exactly the same. In them, each configured field can be referred to by prefixing it with a ?, much like referring to a variable in SPARQL.

### Types of filters¶

Top-level value filter

The special field variable $this (and not ?this, ?$this, $?this) is used to refer to the current context. In the top-level value filter and the top-level document filter, it refers to the document. In the per-field value filter, it refers to the currently filtered field value. In the nested document filter, it refers to the nested document. ALL() quantifier In the context of document-level filtering, a match is true if at least one of potentially many field values match, e.g., ?location = <urn:Europe> would return true if the document contains { "location": ["<urn:Asia>", "<urn:Europe>"] }. In addition to this, you can also use the ALL() quantifier when you need all values to match, e.g., ALL(?location) = <urn:Europe> would not match with the above document because <urn:Asia> does not match. Entity filters and default values Entity filters can be combined with default values in order to get more flexible behavior. If a field has no values in the RDF database, the defaultValue is used. But if a field has some values, defaultValue is NOT used, even if all values are filtered out. See an example in Basic entity filter. A typical use-case for an entity filter is having soft deletes, i.e., instead of deleting an entity, it is marked as deleted by the presence of a specific value for a given property. ### Two-variable filtering¶ Besides comparing a field value to one or more constants or running an existential check on the field value, some use cases also require comparing the field value to the value of another field in order to produce the desired result. GraphDB solves this by supporting two-variable filtering in the per-field value filter and the top-level document filter. Note This type of filtering is not possible in the top-level value filter because the only variable that is available there is $this.

In the top-level document filter, there are no restrictions as all values are available at the time of evaluation.

In the per-field value filter, two-variable filtering will reorder the defined fields such that values for other fields are already available when the current field’s filter is evaluated. For example, let’s say we defined a filter $this > ?salary for the field price. This will force the connector to process the field salary first, apply its per-field value filter if any, and only then start collecting and filtering the values for the field price. Cyclic dependencies will be detected and reported as an invalid filter. For example, if in addition to the above we define a per-field value filter ?price > "1000"^^xsd:int for the field salary, a cyclic dependency will be detected as both price and salary will require the other field being indexed first. ### Basic entity filter example¶ Given the following RDF data: @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix example: <http://www.ontotext.com/example#> . # the entity below will be synchronised because it has a matching value for city: ?city in ("London") example:alpha rdf:type example:gadget ; example:name "John Synced" ; example:city "London" . # the entity below will not be synchronised because it lacks the property completely: bound(?city) example:beta rdf:type example:gadget ; example:name "Peter Syncfree" . # the entity below will not be synchronized because it has a different city value: # ?city in ("London") will remove the value "Liverpool" so bound(?city) will be false example:gamma rdf:type example:gadget ; example:name "Mary Syncless" ; example:city "Liverpool" .  If you create a connector instance such as: PREFIX solr: <http://www.ontotext.com/connectors/solr#> PREFIX solr-index: <http://www.ontotext.com/connectors/solr/instance#> INSERT DATA { solr-index:my_index solr:createConnector ''' { "solrUrl": "http://localhost:8983/solr", "types": ["http://www.ontotext.com/example#gadget"], "fields": [ { "fieldName": "name", "propertyChain": ["http://www.ontotext.com/example#name"] }, { "fieldName": "city", "propertyChain": ["http://www.ontotext.com/example#city"], "valueFilter": "$this = \\"London\\""
}
],
"documentFilter": "bound(?city)"
}
''' .
}


The entity :beta is not synchronized as it has no value for city.

To handle such cases, you can modify the connector configuration to specify a default value for city:

...
{
"fieldName": "city",
"propertyChain": ["http://www.ontotext.com/example#city"],
"defaultValue": "London"
}
...
}


The default value is used for the entity :beta as it has no value for city in the repository. As the value is “London”, the entity is synchronized.

Sometimes, data represented in RDF is not well suited to map directly to non-RDF. For example, if you have news articles and they can be tagged with different concepts (locations, persons, events, etc.), one possible way to model this is a single property :taggedWith. Consider the following RDF data:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix example2: <http://www.ontotext.com/example2#> .

example2:Berlin
rdf:type example2:Location ;
rdfs:label "Berlin" .

example2:Mozart
rdf:type example2:Person ;

example2:Einstein
rdf:type example2:Person ;
rdfs:label "Albert Einstein" .

example2:Cannes-FF
rdf:type example2:Event ;
rdfs:label "Cannes Film Festival" .

example2:Article1
rdf:type example2:Article ;
rdfs:comment "An article about a film about Einstein's life while he was a professor in Berlin." ;
example2:taggedWith example2:Berlin ;
example2:taggedWith example2:Einstein ;
example2:taggedWith example2:Cannes-FF .

example2:Article2
rdf:type example2:Article ;
rdfs:comment "An article about Berlin." ;
example2:taggedWith example2:Berlin .

example2:Article3
rdf:type example2:Article ;
rdfs:comment "An article about Mozart's life." ;
example2:taggedWith example2:Mozart .

example2:Article4
rdf:type example2:Article ;
rdfs:comment "An article about classical music in Berlin." ;
example2:taggedWith example2:Berlin ;
example2:taggedWith example2:Mozart .

example2:Article5
rdf:type example2:Article ;
rdfs:comment "A boring article that has no tags." .

example2:Article6
rdf:type example2:Article ;
rdfs:comment "An article about the Cannes Film Festival in 2013." ;
example2:taggedWith example2:Cannes-FF .


Assume you want to map this data to Solr, so that the property example2:taggedWith x is mapped to separate fields taggedWithPerson and taggedWithLocation, according to the type of x (whereas we are not interested in Events). You can map taggedWith twice to different fields and then use an entity filter to get the desired values:

PREFIX solr: <http://www.ontotext.com/connectors/solr#>
PREFIX solr-index: <http://www.ontotext.com/connectors/solr/instance#>

INSERT DATA {
solr-index:my_index solr:createConnector '''
{
"solrUrl": "http://localhost:8983/solr",
"types": ["http://www.ontotext.com/example2#Article"],
"fields": [
{
"fieldName": "comment",
"propertyChain": ["http://www.w3.org/2000/01/rdf-schema#comment"]
},
{
"fieldName": "taggedWithPerson",
"propertyChain": ["http://www.ontotext.com/example2#taggedWith"],
"valueFilter": "$this -> type = <http://www.ontotext.com/example2#Person>" }, { "fieldName": "taggedWithLocation", "propertyChain": ["http://www.ontotext.com/example2#taggedWith"], "valueFilter": "$this -> type = <http://www.ontotext.com/example2#Location>"
}
]
}
''' .
}


Note

type is the short way to write <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>.

The six articles in the RDF data above will be mapped as such:

Article IRI

Value in taggedWithPerson

Value in taggedWithLocation

Explanation

:Article1

:Einstein

:Berlin

:taggedWith has the values :Einstein, :Berlin and :Cannes-FF. The filter leaves only the correct values in the respective fields. The value :Cannes-FF is ignored as it does not match the filter.

:Article2

:Berlin

:taggedWith has the value :Berlin. After the filter is applied, only taggedWithLocation is populated.

:Article3

:Mozart

:taggedWith has the value :Mozart. After the filter is applied, only taggedWithPerson is populated

:Article4

:Mozart

:Berlin

:taggedWith has the values :Berlin and :Mozart. The filter leaves only the correct values in the respective fields.

:Article5

:taggedWith has no values. The filter is not relevant.

:Article6

:taggedWith has the value :Cannes-FF. The filter removes it as it does not match.

This can be checked by issuing a faceted search for taggedWithLocation and taggedWithPerson:

PREFIX solr: <http://www.ontotext.com/connectors/solr#>
PREFIX solr-index: <http://www.ontotext.com/connectors/solr/instance#>

SELECT ?facetName ?facetValue ?facetCount {
?search a solr-index:my_index ;
solr:facetFields "taggedWithLocation,taggedWithPerson" ;
solr:facets [
solr:facetName ?facetName ;
solr:facetValue ?facetValue ;
solr:facetCount ?facetCount
]
}


If the filter was applied, you should get only :Berlin for taggedWithLocation and only :Einstein and :Mozart for taggedWithPerson:

facetName

facetValue

facetCount

taggedWithLocation

http://www.ontotext.com/example2#Berlin

3

taggedWithPerson

http://www.ontotext.com/example2#Mozart

2

taggedWithPerson

http://www.ontotext.com/example2#Einstein

1

## Overview of connector predicates¶

The following diagram shows a summary of all predicates that can administrate (create, drop, check status) connector instances or issue queries and retrieve results. It can be used as a quick reference of what a particular predicate needs to be attached to. For example, to retrieve entities, you need to use :entities on a search instance and to retrieve snippets, you need to use :snippets on an entity. Variables that are bound as a result of a query are shown in green, blank helper nodes are shown in blue, literals in red, and IRIs in orange. The predicates are represented by labeled arrows.

## SolrCloud support¶

From GraphDB 8.0/Connectors 6.0, the Solr connector has SolrCloud support. SolrCloud is the distributed version of Solr, which offers index sharding, better scaling, fault tolerance, etc. It uses Apache Zookeeper for distributed synchronization and central configuration of the Solr nodes. The Solr indexes are called collections, which is the sharded version of cores.

### Zookeeper instances¶

Creating a SolrCloud connector is the same as creating a Solr connector with the only difference in the syntax of the solrUrl parameter:

"solrUrl":"zk://localhost:2181|numShards=2|replicationFactor=2|maxShardsPerNode=3"


zk://localhost:2181 is the host and port of the started Zookeeper instance and the rest are the parameters for creating the SolrCloud collection, delimited with pipes. The supported cluster parameters are:

• numShards

• replicationFactor

• maxShardsPerNode

• autoAddReplicas

• router.name

• router.field

• shards

Note

numShards and replicationFactor are mandatory parameters. maxShardsPerNode is set to numShards value when absent.

For more information on how to use these options, check the SolrCloud’s Collection API documentation.

You can also have multiple Zookeeper instances orchestrating the Solr nodes. They have to be mentioned in the connection string.

"solrUrl":"zk://localhost:2181,zk://localhost:2182|numShards=2|replicationFactor=2|maxShardsPerNode=3"


Note

The Zookeeper instances must be running on the same hosts as in the solrUrl parameter.

### SolrCloud collection configsets¶

Unlike the standard Solr cores, where each core has a /conf directory containing all of its configurations, SolrCloud collections decouple the configuration from the data. The configurations are called configsets and they reside in the Zookeeper instances. Before you want to create a new collection, you have to upload all your default or custom configurations to Zookeeper under specific names.

Note

Check Command Line Utilities and ConfigSets API from SolrCloud documentation on how to upload configsets.

When creating a SolrCloud connector, you have to specify the configset name in the copyConfigsFrom parameter. If you do not specify it, it will search for a default configset name, which is collection1. As a good practice, it is recommended to upload your default configuration under the name collection1, and then, when you want to create a new connector with default index configuration, you will not have to specify this parameter again. Otherwise, for other custom configsets, use the parameter with the name of the custom configset, i.e., customConfigset.

Example: Create SolrCloud connector query using a custom configset

PREFIX solr: <http://www.ontotext.com/connectors/solr#>
PREFIX solr-index: <http://www.ontotext.com/connectors/solr/instance#>

INSERT DATA {
solr-index:my_collection :createConnector '''
{
"solrUrl": "zk://localhost:2181|numShards=2|replicationFactor=2|maxShardsPerNode=3",
"copyConfigsFrom": "customConfigset"
"types": [
"http://www.ontotext.com/example/wine#Wine"
],
"fields": [
{
"fieldName": "grape",
"propertyChain": [
"http://www.w3.org/2000/01/rdf-schema#label"
]
},
{
"fieldName": "sugar",
"propertyChain": [
"http://www.ontotext.com/example/wine#hasSugar"
],
"multivalued": false
},
{
"fieldName": "year",
"propertyChain": [
"http://www.ontotext.com/example/wine#hasYear"
]
}
]
}
''' .
}


## Caveats¶

### Order of control¶

Even though SPARQL per se is not sensitive to the order of triple patterns, the Solr GraphDB Connector expects to receive certain predicates before others so that queries can be executed properly. In particular, predicates that specify the query or query options need to come before any predicates that fetch results.

The diagram in Overview of connector predicates provides a quick overview of the predicates.

### Migrating from GraphDB 9.x¶

GraphDB 10.0 introduces major changes to the filtering mechanism of the connectors. Existing connector instances will not be usable and attempting to use them for queries or updates will throw an error.

If your GraphDB 9.x (or older) connector definitions do not include an entity filter, you can simply repair them.

If your GraphDB 9.x (or older) connector definitions do include an entity filter with the entityFilter option, you need to rewrite the filter with one of the current filter types:

1. Save your existing connector definition.

2. Drop the connector instance.

3. In general, most older connector filters can be easily rewritten using the per-field value filter and top-level document filter. Rewrite the filters as follows:

Rule of thumb:

• If you want to remove individual values, i.e., if the operand is not BOUND() –-> rewrite with per-field value filter.

• If you want to remove entire documents, i.e., if the operand is BOUND() –> rewrite with top-level document filter.

So if we take the example:

?location = <urn:Europe> AND BOUND(?location) AND ?type IN (<urn:Foo>, <urn:Bar>)


It needs to be rewritten like this:

• Per-field rule on field location: $this = <urn:Europe> • Per-field rule on field type: $this IN (<urn:Foo>, <urn:Bar>)

• Top-level document filter: BOUND(?location)

4. Recreate the connector instance using the new definition.