MongoDB Integration

Overview and features

The MongoDB integration feature is a GraphDB plugin allowing users to query MongoDB databases using SPARQL and to execute heterogeneous joins. This section describes how to configure GraphDB and MongoDB to work together.

MongoDB is a document-based database with the biggest developer/user community. It is part of the MEAN technology stack and guarantees scalability and performance well beyond the throughput supported in GraphDB. Often, we see use cases with extreme scalability requirements and simple data model (i.e. tree representation of a document and its metadata).

MongoDB is a NoSQL JSON document store and does not natively support joins, SPARQL or RDF-enabled linked data. The integration between GraphDB and MongoDB is done by a plugin which sends a request to MongoDB then transforms the result to RDF model.

Each feature is described in detail below.

Usage

The steps for using MongoDB with GraphDB are:

  • Installing MongoDB;
  • Preparing and loading JSON-LD documents in MongoDB;
  • Configuring GraphDB with MongoDB connection settings by creating an index.

In order to be converted to RDF models, the documents in MongoDB should be valid JSON-LDs.

The JSON-LD documents are in hierarchical view allowing more complex search querying of embedded/nested documents.

Each document can be in separate context. That way, the relation between statements in GraphDB and documents in MongoDB is preserved when extracting parts of the documents and importing them in GraphDB, in order to make inferred statements. The import of parts is an option for future development.

Below is shown a sample document in MongoDB from the LDBC SPB benchmark

{
        "_id": { "$oid": "5c0fb7f329298f15dc37bb81"},
        "@graph":
        [{
                "@id": "http://www.bbc.co.uk/things/1#id",
                "@type": "cwork:NewsItem",
                "bbc:primaryContentOf":
                [{
                        "@id": "bbcd:3#id",
                        "bbc:webDocumentType": {
                                "@id": "bbc:HighWeb"
                        }
                },
                {
                        "@id": "bbcd:4#id",
                        "bbc:webDocumentType": {
                                "@id": "bbc:Mobile"
                        }
                }],
                "cwork:about":
                [{
                        "@id": "dbpedia:AccessAir"
                },
                {
                        "@id": "dbpedia:Battle_of_Bristoe_Station"
                },
                {
                        "@id": "dbpedia:Nicolas_Bricaire_de_la_Dixmerie"
                },
                {
                        "@id": "dbpedia:Bernard_Roberts"
                },
                {
                        "@id": "dbpedia:Bartolomé_de_Medina"
                },
                {
                        "@id": "dbpedia:Don_Bonker"
                },
                {
                        "@id": "dbpedia:Cornel_Nistorescu"
                },
                {
                        "@id": "dbpedia:Clete_Roberts"
                },
                {
                        "@id": "dbpedia:Mark_Palansky"
                },
                {
                        "@id": "dbpedia:Paul_Green_(taekwondo)"
                },
                {
                        "@id": "dbpedia:Mostafa_Abdel_Satar"
                },
                {
                        "@id": "dbpedia:Tommy_O'Connell_(hurler)"
                },
                {
                        "@id": "dbpedia:Ahmed_Ali_Salaad"
                }],
                "cwork:altText": "thumbnail atlText for CW http://www.bbc.co.uk/context/1#id",
                "cwork:audience": {
                        "@id": "cwork:NationalAudience"
                },
                "cwork:category": {
                        "@id": "http://www.bbc.co.uk/category/Company"
                },
                "cwork:dateCreated": {
                        "@type": "xsd:dateTime",
                        "@value": "2011-02-15T07:13:29.495+02:00"
                },
                "cwork:dateModified": {
                        "@type": "xsd:dateTime",
                        "@value": "2012-02-14T12:43:13.165+02:00"
                },
                "cwork:description": " constipate meant breaking felt glitzier democrat's huskily breeding solicit gargling.",
                "cwork:liveCoverage": {
                        "@type": "xsd:boolean",
                        "@value": "false"
                },
                "cwork:mentions": {
                        "@id": "geonames:2862704/"
                },
                "cwork:primaryFormat":
                [{
                        "@id": "cwork:TextualFormat"
                },
                {
                        "@id": "cwork:InteractiveFormat"
                }],
                "cwork:shortTitle": " closest subsystem merit rebuking disengagement cerebrums caravans conduction disbelieved might.",
                "cwork:thumbnail": {
                        "@id": "bbct:1361611547"
                },
                "cwork:title": "Beckhoff greatly agitators constructed racquets industry restrain spews pitifully undertone stultification."
        }],
        "@id": "bbcc:1#id",
        "@context": {
                "bbcevent": "http://www.bbc.co.uk/ontologies/event/",
                "geo-pos": "http://www.w3.org/2003/01/geo/wgs84_pos#",
                "bbc": "http://www.bbc.co.uk/ontologies/bbc/",
                "time": "http://www.w3.org/2006/time#",
                "event": "http://purl.org/NET/c4dm/event.owl#",
                "music-ont": "http://purl.org/ontology/mo/",
                "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
                "foaf": "http://xmlns.com/foaf/0.1/",
                "provenance": "http://www.bbc.co.uk/ontologies/provenance/",
                "owl": "http://www.w3.org/2002/07/owl#",
                "cms": "http://www.bbc.co.uk/ontologies/cms/",
                "news": "http://www.bbc.co.uk/ontologies/news/",
                "cnews": "http://www.bbc.co.uk/ontologies/news/cnews/",
                "cconcepts": "http://www.bbc.co.uk/ontologies/coreconcepts/",
                "dbp-prop": "http://dbpedia.org/property/",
                "geonames": "http://sws.geonames.org/",
                "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
                "domain": "http://www.bbc.co.uk/ontologies/domain/",
                "dbpedia": "http://dbpedia.org/resource/",
                "geo-ont": "http://www.geonames.org/ontology#",
                "bbc-pont": "http://purl.org/ontology/po/",
                "tagging": "http://www.bbc.co.uk/ontologies/tagging/",
                "sport": "http://www.bbc.co.uk/ontologies/sport/",
                "skosCore": "http://www.w3.org/2004/02/skos/core#",
                "dbp-ont": "http://dbpedia.org/ontology/",
                "xsd": "http://www.w3.org/2001/XMLSchema#",
                "core": "http://www.bbc.co.uk/ontologies/coreconcepts/",
                "curric": "http://www.bbc.co.uk/ontologies/curriculum/",
                "skos": "http://www.w3.org/2004/02/skos/core#",
                "cwork": "http://www.bbc.co.uk/ontologies/creativework/",
                "fb": "http://rdf.freebase.com/ns/",
                "ot": "http://www.ontotext.com/",
                "ldbcspb": "http://www.ldbcouncil.org/spb#",
                "bbcd": "http://www.bbc.co.uk/document/",
                "bbcc": "http://www.bbc.co.uk/context/",
                "bbct": "http://www.bbc.co.uk/thumbnail/"
        }
}
  • _id key is a MongoDB internal key.
  • @graph node represents the RDF context in the JSON-LD doc.
  • @type xsd:dateTime date has a @date key with an ISODate(...) value. This is not related to the JSON-LD standard and is ignored when the document is parsed to RDF model. The dates are extended for faster search/sorting. The ISODate in MongoDB is its internal way to store dates and is optimized for searching. This step will make querying/sorting by this date field easier but is optional.

Note

The keys in MongoDB cannot contain “.”, nor start with “$”. Although the JSON-LD standard allows it, MongoDB does not. Therefore, either use namespaces (see the sample above) or encoding the . and $, respectively. Only the JSON keys are subject to decoding.

Setup and maintenance

Installing MongoDB

Setting up and maintaining a MongoDB database is a separate task and must be accomplished outside of GraphDB. See the MongoDB website for details.

Note

Throughout the rest of this document, we assume you have the MongoDB server installed and running on a computer you can access.

Note

The GraphDB integration plugin uses MongoDB Java driver version 3.8. More information about the compatibility between MongoDB Java driver and MongoDB version is available on the MongoDB website.

Creating an index

To configure GraphDB with MongoDB connection settings we need to set:

  • The server where MongoDB is running;
  • The port on which MongoDB is listening;
  • The name of the database you are using;
  • The name of the MongoDB collection you are using;
  • The credentials (optional unless you are using authentication) - the username and password that will let you connect to the database.

Below is a sample query of how to create a MongoDB index:

PREFIX mongodb: <http://www.ontotext.com/connectors/mongodb#>
PREFIX inst: <http://www.ontotext.com/connectors/mongodb/instance#>
INSERT DATA {
    inst:spb1000 :service "mongodb://localhost:27017" ;
        :database "ldbc" ;
        :collection "creativeWorks" .
}

Supported predicates:

  • :service - MongoDB connection string;
  • :database - MongoDB database;
  • :collection - MongoDB collection;
  • :user - (optional) MongoDB user for the connection;
  • :password - (optional) the user’s password;
  • :authDb - (optional) the database where the user is authenticated.

Upgrading an index

When upgrading to a newer GraphDB version, it might happen that it contains plugins that are not present in the older version. In this case, the PluginManager disables the newly detected plugin, so you need to enable it by executing the following SPARQL query:

"insert data { [] <http://www.ontotext.com/owlim/system#startplugin> "mongodb" }"

Then create the plugin in question by executing the SPARQL query provided above, and also make sure to not delete the database in the plugin you are using.

Deleting an index

Deletion of an index is done using the following query:

PREFIX : <http://www.ontotext.com/connectors/mongodb#>
PREFIX inst: <http://www.ontotext.com/connectors/mongodb/instance#>
INSERT DATA {
        inst:spb1000 :drop _:b .
}

Loading sample data

Import provided cwork1000.json file with 1000 of CreativeWork documents in MongoDB database “ldbc” and “creativeWorks” collection.

mongoimport --db ldbc --collection creativeWorks --file cwork1000.json

Querying MongoDB

Below is a sample query which returns the dateModified for docs with the specific audience:

PREFIX cwork: <http://www.bbc.co.uk/ontologies/creativework/>
PREFIX inst: <http://www.ontotext.com/connectors/mongodb/instance#>
PREFIX : <http://www.ontotext.com/connectors/mongodb#>

SELECT ?creativeWork ?modified WHERE {
        ?search a inst:spb1000 ;
                :find '{"@graph.cwork:audience.@id" : "cwork:NationalAudience"}' ;
                :entity ?entity .
        GRAPH inst:spb1000 {
                ?creativeWork cwork:dateModified ?modified .
        }
}
_images/mongodb-query-ex1-result.png

In a query, use the exact values as in the docs. For example, if the full URIs are used instead of “cwork:NationalAudience” or “@graph.cwork:audience.@id” there wouldn’t be any matching results.

The :find argument is a valid BSON document.

Note

The results are returned in a named graph to indicate when the plugin should bind the variables. This is an API plugin limitation. The variables to be bound by the plugin are in a named graph. This allows GraphDB to determine whether to bind the specific variable using MongoDB or not.

Supported predicates:

  • :find - accepts single BSON and sets a query string. The value is used to call db.collection.find();
  • :project - accepts single BSON. The value is used to select the projection for the results returned by :find. Find more info at MongoDB: Project Fields to Return from Query.
  • :aggregate - accepts an array of BSONs. Calls db.collection.aggregate(). This is the most flexible way to make a MongoDB query as the find() method is just a single phase of the aggregation pipeline. The :aggregate predicate takes precedence over :find and :project. This means that if both :aggregate and :find are used, :find will be ignored.
  • :graph - accepts an IRI. Specifies the IRI of the named graph in which the bound variables should be. Its default value is the name of the index itself.
  • :entity - (REQUIRED) returns the IRI of the MongoDB document. If the JSON-LD has context, the value of @graph.@id is used. In case of multiple values, the first one is chosen and a warning is logged. If the JSON-LD has no context, the value of @id node is used. Even if the value from this predicate is not used, it is required to have it in the query in order to inform the plugin that the graph part of the current iteration is completed.
  • :hint - specifies the index to be used when executing the query (calls cursor.hint())

Multiple index calls in the same query

Multiple MongoDB calls are supported in the same query. There are two approaches:

  • Each index call to be in a separate subselect (Example 1);
  • Each index call to use different named graph. If querying different indexes, this comes out-of-the-box. If not, use the :graph predicate. (Example 2).

Example 1:

PREFIX cwork: <http://www.bbc.co.uk/ontologies/creativework/>
PREFIX inst: <http://www.ontotext.com/connectors/mongodb/instance#>
PREFIX : <http://www.ontotext.com/connectors/mongodb#>
SELECT ?creativeWork ?modified WHERE {
    {
        SELECT ?creativeWork ?modified {
            ?search a inst:spb1000 ;
                :find '{"@graph.@id" : "http://www.bbc.co.uk/things/1#id"}' ;
                :entity ?creativeWork .
            GRAPH inst:spb1000 {
                ?creativeWork cwork:dateModified ?modified ;
            }
        }
    }
    UNION
    {
        SELECT ?creativeWork ?modified WHERE {
            ?search a inst:spb1000 ;
                :find '{"@graph.@id" : "http://www.bbc.co.uk/things/2#id"}' ;
                :entity ?entity .
            GRAPH inst:spb1000 {
                ?creativeWork cwork:dateModified ?modified ;
            }
        }
    }
}

Example 2:

PREFIX cwork: <http://www.bbc.co.uk/ontologies/creativework/>
PREFIX inst: <http://www.ontotext.com/connectors/mongodb/instance#>
PREFIX : <http://www.ontotext.com/connectors/mongodb#>
SELECT ?creativeWork ?modified WHERE {
    {
        ?search a inst:spb1000 ;
                :graph :search1 ;
                :find '{"@graph.@id" : "http://www.bbc.co.uk/things/1#id"}' ;
                :entity ?creativeWork .
        GRAPH :search1 {
                ?creativeWork cwork:dateModified ?modified ;
        }
    }
    UNION
    {
        ?search a inst:spb1000 ;
                :graph :search2 ;
                :find '{"@graph.@id" : "http://www.bbc.co.uk/things/2#id"}' ;
                :entity ?entity .
        GRAPH :search2 {
                ?creativeWork cwork:dateModified ?modified ;
        }
    }
}

Both examples return the same result.

_images/mongodb-multiple-queries-ex1.png

Using aggregation functions

MongoDB has a number of aggregation functions such as: min, max, size, etc. These functions are called using the :aggregate predicate. The data of the retrieved results has to be converted to RDF model. The example below shows how to retrieve the RDF context of a MongoDB document.

PREFIX cwork: <http://www.bbc.co.uk/ontologies/creativework/>
PREFIX inst: <http://www.ontotext.com/connectors/mongodb/instance#>
PREFIX : <http://www.ontotext.com/connectors/mongodb#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?s ?o {
    ?search a inst:spb1000 ;
        :aggregate '''[{"$match": {"@graph.@id": "http://www.bbc.co.uk/things/1#id"}},
                {'$addFields': {'@graph.cwork:graph.@id' :  '$@id'}}]''' ;
        :entity ?entity .
    GRAPH inst:spb1000 {
        ?s cwork:graph ?o .
    }
}

The $addFields phrase adds a new nested document in the JSON-LD stored in MongoDB. The newly added document is then parsed to the following RDF statement:

<http://www.bbc.co.uk/things/1#id> cwork:graph <http://www.bbc.co.uk/context/1#id>

We retrieve the context of the document using the cwork:graph predicate.

This approach is really flexible but is prone to error.

Let’s examine the following query:

PREFIX cwork: <http://www.bbc.co.uk/ontologies/creativework/>
PREFIX inst: <http://www.ontotext.com/connectors/mongodb/instance#>
PREFIX : <http://www.ontotext.com/connectors/mongodb#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?g1 ?g2 {
    ?search a inst:spb1000 ;
        :aggregate '''[{"$match": {"@graph.@id": "http://www.bbc.co.uk/things/1#id"}},
                {'$addFields': {'@graph.inst:graph.@id' :  '$@id'}}]''' ;
        :entity ?entity .
    GRAPH inst:spb1000 {
        OPTIONAL {
            ?s inst:graph ?g1 .
        }
        ?s <inst:graph> ?g2 .
    }
}

It looks really similar to the first one except that instead of @graph.cwork:graph.@id we are writing the value to @graph.inst:graph.@id and as a result ?g1 will not get bound. This happens because in the JSON-LD stored in MongoDB we are aware of the cwork context but not of the inst context. So ?g2 will get bound instead.

Custom fields

Example:

PREFIX cwork: <http://www.bbc.co.uk/ontologies/creativework/>
PREFIX inst: <http://www.ontotext.com/connectors/mongodb/instance#>
PREFIX : <http://www.ontotext.com/connectors/mongodb#>

SELECT ?size ?halfSize {
    ?search a inst:spb1000 ;
        :aggregate '''[{"$match": {"@graph.@type": "cwork:NewsItem"}},
                {"$count": "size"},
                {"$project": {"custom.size": "$size", "custom.halfSize": {"$divide": ["$size", 2]}}}]''' ;
        :entity ?entity .
    GRAPH inst:spb1000 {
        ?s inst:size ?size ;
        inst:halfSize ?halfSize .
    }
}
_images/mongodb-custom-field-example.png

The values are projected as child elements of a custom node. After JSON-LD is taken from MongoDB, a pre-processing follows in order to retrieve all child elements of custom and create statements with predicates in the <http://www.ontotext.com/connectors/mongodb/instance#> namespace.

Note

The returned values are always string literals.

Authentication

All types of authentication can be achieved by setting the credentials in the connection string. However, as it is not a good practice to store the passwords in plain text, the :user, :password and :authDb predicates are introduced. If one of those predicates is used, it is mandatory to set the other two as well. These predicates set credentials for SCRAM and LDAP authentication and the password is stored encrypted with a symmetrical algorithm on the disk. For x.509 and Kerberos authentication the connection string should be used as no passwords are being stored.