Revision as of 17:57, 30 March 2023

Paper

Paper
edit
description
id	Vol-3194/paper31
wikidataid	→Q117344944
title	LPG-based Ontologies as Schemas for Graph DBs
pdfUrl	https://ceur-ws.org/Vol-3194/paper31.pdf
dblpUrl	https://dblp.org/rec/conf/sebd/FerilliRP22
volume	Vol-3194→Vol-3194
session	→
LPG-based Ontologies as Schemas for Graph DBs

LPG-based Ontologies as Schemas for Graph DBs
Stefano Ferilli1 , Domenico Redavid1 and Davide Di Pierro1
1
    University of Bari – Department of Computer Science, Via E. Orabona, 4, Bari, 70125, Italy


                                         Abstract
                                         Graph DBs are an emerging NoSQL technology that is boosting the opportunity of data handling based
                                         on interconnection and processing of single instances, rather than batch processing as usual in traditional
                                         relational DBs. Differently from relational DBs, the most prominent graph DB, Neo4j, is schema-less and
                                         based on the LPG graph model. We propose the definition and uses of ontologies as schemas, which
                                         would also enable high-level (logical) automated reasoning on the data. The graph model adopted by
                                         standard approaches to ontologies in Computer Science is incompatible with the LPG model. So, we
                                         propose a technology, called GraphBRAIN, specifically designed to exploit the full representational power
                                         of LPGs, still having a mapping to standard ontological approaches. GraphBRAIN also allows to apply
                                         different schemas on one underlying graph, representing different but inter-related views on the same
                                         data, and to combine schemas. This paper describes the formalism and outlines its possible applications.
                                         Development and implementation of the technology is ongoing, and a prototype is available and running.

                                         Keywords
                                         Graph Databases, Ontologies, Labeled Property Graphs




1. Introduction
New opportunities in data storage and handling have been brought about by the recent de-
velopment of graph databases, a kind of NoSQL DBs. There are major differences between
traditional relational DBs and graph DBS. E.g., the latter aim at optimizing element-driven
data browsing rather than batch processing as in the former. Also, the former are based on
the ‘table’ metaphor, while the latter are based on the ‘network’ metaphor. The relevance of
the graph-based approach to DB technology nowadays is witnessed by many big players in
the industry developing their own, proprietary and special-purpose, solutions: e.g., Google’s
‘Knowledge Graph’, Facebook’s ‘Social Graph’ and Twitter’s ‘Interest Graph’. A more general-
purpose solution is Microsoft Research’s ‘Graph Engine’ (previously known as ‘Trinity’) [1].
The most popular graph DB according to DB-Engines1 is Neo4j [2]. It is being adopted by
many big companies and governmental organizations for several different and relevant use
cases, including Recommendation, Biology, Artificial Intelligence and Data Analytics, Social
Networks, Data Science and Knowledge Graphs2 . Neo4j adopts the Labeled Property Graphs
(LPGs) model [3]. In LPGs, both nodes and arcs may have names (called labels for nodes and

SEBD 2022: The 30th Italian Symposium on Advanced Database Systems, June 19-22, 2022, Tirrenia (PI), Italy
$ stefano.ferilli@uniba.it (S. Ferilli); domenico.redavid1@uniba.it (D. Redavid); davide.dipierro@uniba.it
(D. Di Pierro)
� 0000-0003-1118-0601 (S. Ferilli)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
                  1
                    https://db-engines.com/en/ranking
                  2
                    See https://neo4j.com/use-cases/
�types for arcs) and can store properties represented as key/value maps. Labels usually represent
classes, nodes represent class instances, types represent relationships, and arcs represent re-
lationship instances. Each node may have many labels, while each arc may have at most one
type. Many arcs, possibly labeled with the same type, may exist between the same pair of nodes.
In Neo4j, nodes and arcs are associated with unique identifiers. Neo4j comes with a powerful
query language (Cypher) and extensive libraries for advanced data manipulation (APOC).
   Neo4j is schema-less: the user may apply any label/type or property to each single node or arc
(only simple ‘constraints’ may be defined to bias the DB content). While ensuring great flexibility,
this causes the lack of a clear semantics for the data, and obviously tampers their interpretability
and interoperability. This motivated us to develop a formalism for expressing data schemas for
LPG-based graph DBs (and specifically Neo4j), so that only data compliant with the schemas
can be stored, as in traditional DBs. In particular, we propose to provide schemas in the form of
formal ontologies. In fact, according to [4], an ontology is “a formal, explicit specification of a
shared conceptualization”, and thus building an ontology and designing the conceptual model of
a DB (e.g., in the form of an E-R diagram) are basically the same activity. Using formal ontologies
as schemas we may enjoy the advantages of DBMSs (scalability, storage optimization, efficient
handling, mining and browsing of the data, etc.) and LPGs (flexibility, expressive power), while
also allowing to carry out high-level logical reasoning on the data. As emphasized in [5], this is
an important feature because formal, automated reasoning is much more powerful than the DB’s
query language. E.g., ontological reasoning tasks include consistency/completeness/correctness
checks, instance retrieval, classification, and query answering [6]; rule-based reasoning enables
multiple inference strategies (e.g., deduction, abduction, argumentation, etc.), and combinations
thereof. In fact, several ready-to-use reasoners are available. This means upgrading from the
‘Data Base’ (DB) perspective to the ‘Knowledge Base’ (KB) perspective, investigated in Artificial
Intelligence (AI). More specifically, applying an ontology as the data model to the data one
obtains a so-called Knowledge Graph (KG, a kind of KB) [7]: ontology + data = knowledge
graph [8]. Since the graph model adopted by standard AI approaches to formal ontologies is
partially incompatible with LPGs, in [9] we proposed the GraphBRAIN technology, specifically
designed to fully exploit the representational power of LPGs. The core of this technology is a
formalism for expressing schemas for LPGs (Neo4j) as ontologies. While the structure of the
formalism, and its correspondence to the standard ontological model adopted in the literature,
have been described in [9], here we will describe it from a more practical viewpoint and from a
more DB-oriented perspective, referring to a new, slightly modified version of the formalism.
   In the following, after introducing in Section 2 the basics and related works about formal
ontologies and graph DBs, Section 3 describes our formalism for interfacing the two technologies,
whose opportunities and current status of implementation are discussed in Section 4.


2. Basics and Related Work
A standard formalism for expressing ontologies and KGs is the Web Ontology Language, OWL3 ,
for which a number of reasoners is available4 . Operational uses of OWL are typically based on
    3
        https://www.w3.org/OWL/
    4
        http://owl.cs.manchester.ac.uk/tools/list-of-reasoners/
�the Resource Definition Framework (RDF)5 . An RDF Graph is a collection of RDF Triples of the
form (Subject, Predicate, Object) representing directed arcs where the Subject, Predicate, and
Object are atomic values (Uniform Resource Identifiers –URIs– or, in the case of the Object, also
a literal value). Triplestores (or ‘Semantic Graph Databases’) are DBMSs specifically focusing
on RDF Data, and thus not optimized for generic data handling, like standard DBMSs. Among
them, GraphDB may work schema-less or using an RDF ontology as schema. LPGs (and Neo4j)
provide more general structure than RDF graphs [5]6 . Most notably, in LPGs nodes and arcs
may carry information, which ensures a much more compact structure than RDF graphs (the
estimated decrease in number of nodes is of up to one order of magnitude), and improves
efficiency (especially in browsing-intensive tasks such as Social Network Analysis or Graph
Mining algorithms) and readability7 . In particular, RDF cannot attach properties to instances of
relationships (this may be solved by reification, i.e., turning relationships into entities, at the
cost of an increased number of nodes/arcs and less readability, or by using annotations, that
are not visible to reasoners). Also, RDF cannot distinguish different occurrences of the same
relationship between the same pair of nodes; Neo4j can, by assigning unique identifiers to arcs.
   The need, but limited adoption, of logic-based Knowledge Representation for the development
of KGs is pointed out in [10]. E.g., [11] stores the Freebase KG in Neo4j, but it does not use
ontologies as DB schemas, and focuses on simple querying, not on reasoning. Also SciGraph8
clearly states that creating ontologies and supporting reasoning are not its goals. In contrast,
ontologies are the core of our approach. In investigating the mapping of ontologies or KGs to
LPGs (or to Neo4j specifically), most works in the literature adopt an ‘OWL-centric’ perspective.
While LPGs can obviously store atomic values in their nodes and arcs, we aim at fully exploiting
their representational power and flexibility, and thus our approach is ’LPG-centric’. E.g., [12]
studies the expressiveness and complexity of the Shape Expression Schema (ShEx) formalism
for RDF. [13] discusses how ontological schemas can be applied to Neo4j graphs whose labeled
edges belong to a number of predetermined classes. Some works limit the portion that can be
mapped. OWL2LPG9 identifies specific kinds of queries expressible in Neo4j that should be more
performant than using reasoners, and translates the ontology, not the data. VirtualFlyBrain10
translates only “a well defined subset” of OWL 2 EL ontologies into Neo4j and back that preserve
entailments and annotations (not the syntactic structure). Other approaches change the LPGs
or RDF models to allow a mapping. E.g., [14] redefines the PG model, while OWLStar11 and
the formal mapping proposed in [15] are based on RDF⋆ , an extension of RDF (and thus not
compliant with standard reasoners). We are interested in solutions that fully exploit the LPG
model, and in what and how can be mapped onto standard RDF, so as to reuse available reasoners.
   The main approach adopted in the Neo4j community12 consists in importing into Neo4j the
RDF triples specifying the ontology as they are. So, the ontology and the data, albeit disjoint,
    5
        https://www.w3.org/RDF/
    6
        https://neo4j.com/blog/rdf-triple-store-vs-labeled-property-graph-difference/
     7
       Albeit not directly related to data storage and management, readability may be important when a portion of
the graph is to be graphically displayed for humans.
    8
      https://github.com/SciGraph/SciGraph/wiki/Neo4jMapping
    9
      https://protegeproject.github.io/owl2lpg
   10
      https://github.com/VirtualFlyBrain/neo4j2owl
   11
      https://github.com/cmungall/owlstar
   12
      https://neo4j.com/blog/ontologies-in-neo4j-semantics-and-knowledge-graphs/
�coexist in the same graph, almost like in relational DBs schemas are stored within the DBMS
itself. Since the schema and the data are to be handled in totally different ways, we keep them
apart, as in relational DBMSs, where they are stored in different DBs. Ontological reasoning on
the imported ontology is carried out using Cypher queries. However, no formal discussion is
provided about what kinds of reasoning can be mapped onto graph DB queries. Usually it is quite
simple (e.g., navigation of the subclass hierachy), and hardly comparable to those provided by
state-of-the-art ontological reasoners. In any case, implementation of these reasoning facilities
is still in charge of the applications accessing the DB. Also, this solution does not prevent data
that are not compliant with the intended ontology to be inserted into the DB.
   Since data representation constrained to using RDF triples does not necessarily make sense
out of the Automated Reasoning field, we start from the ‘standard’ DB perspective based on
LPGs, and aim at providing the DB with a schema that may also enable formal reasoning on
its contents. In our vision, KB designers must design the data schemas, expressed in the form
of ontologies for LPGs, that will drive all subsequent accesses to the DB. By referring to a
schema, the applications will commit to be compliant with it, as in traditional databases. Like in
Triplestores, this will ensure a tight integration between the data and the schema. As opposed
to Triplestores and most of the cited works, the data/instances (stored in the graph DB) are
kept apart from the schema/ontology (specified in a file external to the DB, using an ontology
representation format). We leverage this separation between the data repository and the data
schema to obtain the additional opportunity of applying different (but compatible) schemas to
the same DB. Indeed, each schema may represent a different, partial view on the same data,
allowing to limit or expand the possible interactions depending on specific needs, and adding
flexibility to our solution. Again, this is not even thinkable in Triplestores. [16, 17] deal with
automated inference of graph schemas, pointing out difficulties of such a task. We don’t aim at
inferring the DB schema, but at providing graph DB users a formalism that allows to specify
them and to apply high-level reasoning on the DB data.


3. GraphBRAIN DB Scheme Format
As said, the objective of this work is to propose a formalism to endow LPG-based graph DBs
with a schema that ensures a clear semantics to the information they contain and drives its
management and interpretation. In our approach the schemas are expressed as formal ontologies
whose elements are expressed by elements of the LPG model according to Table 1. We call this
technology GraphBRAIN, and its schemas GBSs (for GraphBRAIN Schemas). GBSs are expressed
as XML files, whose syntax is specified by an associated DTD file. In the GBS formalism, all
names of domains, entities, relationships and attributes may consist of letters or decimal digits
only. We suggest to avoid digits and build multi-word names by juxtaposing their constituent
words, using an uppercase letter for the first letter of each subsequent word.
   In the following we will describe the XML tags and structure GBS formalism through a
‘general’ schema running example, including general classes and relationships that are likely
to be reused by most other schemas. An excerpt of the GBS describing the ‘general’ schema
is reported in Figures 1 and 2. Let us first describe the general structure of the files. After the
standard XML file heading, the main tag, enclosing the whole schema, is domain, expressing
�Table 1
Correspondence between ontology, relational DB and LPG elements
                           OWL*                      Relational DB                    LPG
                      owl:Ontology                        schema     label
                        owl:Class                   entity table namelabel
                   owl:ObjectProperty            relationship table name
                                                                     type
                      owl:Individual                 entity table tuple
                                                                     node
                   object property URI           relationship table tuple
                                                                      arc
                  owl:DatatypeProperty            entity table attribute
                                                                node property
                                               relationship table attribute
                                                                 arc property
                                                      entity table key
                                                                   node id
                                                  relationship table key
                                                                     arc id
                        * (OWL namespace http://www.w3.org/2002/07/owl)


in attribute name the name of the domain that the schema is describing (which must start with
a lowercase letter) and in attribute author the author of the schema. Immediately inside this
tag, two sections are specified, introduced by tags entities and relationships and defining the
portion of schema specifying nodes and arcs in the graph, respectively.
   Figure 1 focuses on the specification of nodes in the graph DB. The schema assumes an
implicit universal entity, and starts describing the hierarchy of entities starting from classes at
level 1 (we will call these top entities). Top entities in the schema will be used to label nodes
in the graph13 . So, all nodes with the same label will be considered as instances of the same
top entity, much like tuples in the same entity table in a relational DB. Top entities should
be selected by the DB designer so as to be described by sufficiently different properties. Each
entity is enclosed in an entity tag, specifying its name in the name attribute. In GBS formalism
Entity names must start with an uppercase letter. In the ‘general’ schema, the following top
entities are provided for: ‘Artifact’ (any physical product of human labour), ‘Collection’ (any
collection of any kind of entities), ‘ContentDescription’, ‘Document’ (any kind of multimedia
document), ‘Event’, ‘IntellectualWork’, ‘Item’ (specific instances of Artifacts), ‘Organization’,
‘Person’, ‘Place’, and ‘User’. In Figure 1, the description of entity ‘Artifact’ has been partially
exploded as an example of the XML sub-structure associated to entities. We note two sections.
   The section enclosed by tag attributes is mandatory, and specifies the attributes that are
applicable to the top entity (at least one attribute must be specified for each top entity). This
constrains nodes labeled with that top entity to take only these attributes. Each attribute is
described by attributes name (which must start with a lowercase letter in GBS formalism),
mandatory (saying whether it must be specified for each node), datatype (specifying the type of
the attribute), and distinguishing (meaning that the attribute is useful to distinguish different
items having the same values for mandatory attributes). The set of mandatory and distinguishing
attributes acts as a ‘logical’ key for the entity instances, useful for human users to distinguish
them (syntactically, the node id automatically assigned by Neo4j is sufficient to acts as a unique

    13
       This is for compliance with arcs, allowing only one type. On nodes, additional labels can be added to specify
domains for which the node is relevant. This does not generate ambiguity, since domain names start with a lowercase
letter, entity names with an uppercase one.
�<?xml version="1.0"?>
<domain name="general" author="graphbrain" version="1.0">
   <entities>
      <entity name="Artifact">
          <attributes>
             <attribute name="name" mandatory="true" datatype="string"/>
         <attribute name="description" distinguishing="true" mandatory="false" datatype="string"/>
          </attributes>
          <taxonomy>
             <value name="Artwork"> ... </value>
             <value name="Handicraft"> ... </value>
             <value name="IndustrialWork">
                <taxonomy>
                   <value name="Component"> ... </value>
                   <value name="Device">
                       <attributes>
                          <attribute name="partNumber" mandatory="false" datatype="string"/>
                       </attributes>
                   </value>
                   ...
                </taxonomy>
             </value>
          </taxonomy>
      </entity>
      <entity name="Collection"> ... </entity>
      <entity name="ContentDescription"> ... </entity>
      <entity name="Document"> ... </entity>
      <entity name="Event"> ... </entity>
      <entity name="IntellectualWork"> ... </entity>
      <entity name="Item"> ... </entity>
      <entity name="Organization"> ... </entity>
      <entity name="Person"> ... </entity>
      <entity name="Place"> ... </entity>
      <entity name="User"> ... </entity>
   </entities>
   <relationships> ... </relationships>
</domain>


Figure 1: Excerpt of sample GBS file (focus on entities)


key). Attribute names ‘specialization’ and ‘notes’ are reserved, since they are automatically
added by GraphBRAIN to all top entities. In the ‘general’ schema, the top entity ‘Artifact’ is
described by two attributes, both of type string: ‘name’ (mandatory) and ‘description’ (optional,
but distinguishing), plus ‘specialization’ and ‘notes’ (an optional string).
   The section enclosed by tag taxonomy is optional, and describes the hierarchy of subclasses
for the top class. Each immediate sub-class is enclosed by tag value, reporting its name in the
name attribute. The behavior of this tag is just like the entity tag: it may specify additional
attributes for the subclass (enclosed in an attributes tag), to be added to the attributes of all
its superclasses in the hierarchy, and further immediate sub-classes (enclosed in a taxonomy
tag), recursively. In the ‘general’ schema, the top entity ‘Artifact’ has 3 sub-classes (‘Artwork’,
‘Handicraft’ and ‘IndustrialWork’), the latter of which has in turn sub-classes, one of which
(‘Device’) provides for an additional attribute (‘partNumber’, an optional string) with respect to
the top entity. Instances may belong to any top or intermediate class in the hierarchy, and their
most specific class is specified in the specialization attribute (for instances of the top class, it
�<?xml version="1.0"?>
<domain name="general" author="graphbrain" version="1.0">
   <entities> ... </entities>
   <relationships>
      <relationship name="aliasOf" inverse="aliasOf">
          <references>
             <reference subject="Category" object="Category"/>
             <reference subject="Document" object="Document"/>
             <reference subject="User" object="Person"/>
             <reference subject="Person" object="Person"/>
             <reference subject="Place" object="Place"/>
          </references>
      </relationship>
      <relationship name="attended" inverse="attendedBy">
          <references>
             <reference subject="Organization" object="Event"/>
             <reference subject="Person" object="Event"/>
          </references>
          <attributes>
             <attribute name="startDate" mandatory="true" datatype="date"/>
             <attribute name="endDate" mandatory="false" datatype="date"/>
             <attribute name="role" mandatory="false" datatype="string"/>
          </attributes>
      </relationship>
      <relationship name="belongsTo" inverse="includes"> ... </relationship>
      <relationship name="developed" inverse="developedBy"> ... </relationship>
      <relationship name="evolves" inverse="evolvedBy"> ... </relationship>
      <relationship name="expresses" inverse="expressedBy"> ... </relationship>
      <relationship name="instanceOf" inverse="hasInstance"> ... </relationship>
      <relationship name="interactedWith" inverse="interactedWith"> ... </relationship>
      <relationship name="isA" inverse="hasSubclass"> ... </relationship>
      <relationship name="knows" inverse="knownBy"> ... </relationship>
      <relationship name="owned" inverse="ownedBy"> ... </relationship>
      <relationship name="partOf" inverse="hasPart"> ... </relationship>
      <relationship name="produced" inverse="producedBy"> ... </relationship>
      <relationship name="requires" inverse="requiredBy"> ... </relationship>
      <relationship name="wasIn" inverse="hosted"> ... </relationship>
      ...
   </relationships>
</domain>


Figure 2: Excerpt of sample GBS file (focus on relationships)


reports the top class itself). Top class and subclass names must be unique in the whole schema.
   Figure 2 focuses on the specification of arcs in the graph DB. The schema assumes an implicit
universal relationship, and starts describing the hierarchy of relationships starting from level
1 (we will call these top relationships). Top relationships in the schema will be used to label
arcs in the Neo4j graph (which admits only one type label per arc). So, all arcs with the same
type will be considered as instances of the same relationships, much like tuples in the same
relational table in a relational DB. As for top entities, top relationships should be selected by the
DB designer so as to refer to relationships described by sufficiently different properties. Each
relationship is enclosed in a relationship tag, specifying its name in the name attribute and the
name of its inverse relationship in the inverse attribute. Relationship (and inverse relationship)
names must start with a lowercase letter in GBS formalism. Some of the top relationships
provided for by the ‘general’ schema are: ‘aliasOf’, ‘attended’, ‘belongsTo’, ‘developed’, etc. In
�<entity name="Person">
  <attributes>
      <attribute name="surname" mandatory="true" display="true" datatype="string"/>
      <attribute name="name" mandatory="true" display="true" datatype="string"/>
      <attribute name="gender" mandatory="false" datatype="select">
          <values>
             <value name="M"/>
             <value name="F"/>
          </values>
      </attribute>
      <attribute name="bornIn" mandatory="false" datatype="entity" target="Place" />
      <attribute name="birthDate" mandatory="false" datatype="date"/>
      <attribute name="diedIn" mandatory="false" datatype="entity" target="Place" />
      <attribute name="deathDate" mandatory="false" datatype="date"/>
   </attributes>
</entity>


Figure 3: Sample entity with attributes of different types


Figure 2, the description of relationships ‘aliasOf’ and ‘attended’ has been partially exploded as
an example of the XML sub-structure associated to relationships. We note two sections.
   The section enclosed by tag references is mandatory, and specifies the subject and object
entity of the relationship (i.e., the admitted pairs of classes for source and sink nodes of the arcs
with that relationship type), using attributes subject and object, respectively. The section enclosed
by tag attributes is as for entities, but it is not mandatory, because the relationship itself is
a kind of property of the two entities it connects. A third (optional) section, enclosed by tag
taxonomy, would allow to describe the hierarchy of subrelationships for the top relationships.
As for the hierarchy of sub-entities, it is recursive, and allows to express for each sub-relationship
its references, attributes and sub-taxonomy (if any), using the same tags and nested structure.
   In the ‘general’ schema, an arc of type ‘aliasOf’ (to associate nodes that refer to the same
thing) can be added only between pairs of nodes labeled Category and Category, Document
and Document, User and Person, Person and Person, or Place and Place. It has no attributes.
An arc of type ‘attended’ (expressing event attendance) can be added only between pairs of
nodes labeled Organization and Event, or Person and Event. It has 3 attributes, 2 of type date
(‘startDate’, mandatory, and ‘endDate’, optional) and one of type string (role, optional). Neither
of these relationships has a taxonomy of subrelationships.
   For attributes, GBS admits the following datatypes: integer, real, boolean, string, text,
select, tree, date, entity. Of these, integer, real, boolean, string, text take an atomic value of
the corresponding type, where text is intended for free text of any length, differently from
string which has a limited maximum length that can be specified in the length attribute. Values
of these types are stored as literal values for the corresponding DB types provided by Neo4j:
Integer and Float (both subtypes of an abstract type Number), Boolean, and String.
   Attributes of type select denote a choice in an enumeration of values, enclosed in a tag values
with each value enclosed in a tag value, specifying the value in the name attribute. GraphBRAIN
always adds a default value ‘Other’ to the list of values of any attribute of type select. This type
is used in Figure 3 for attribute ‘gender’ of entity ‘Person’, that may take values ‘M’ or ‘F’ or
’Other’. Attributes of type tree are similar to attributes of type select, but indicate a choice in a
tree of values. The tree can be described by allowing nested values tags inside value tags. For
�<entities>
   <entity name="Timeline"/>
   <entity name="Year">
      <attributes>
         <attribute name="year" mandatory="true" datatype="integer"/>
      </attributes>
   </entity>
   <entity name="Month">
      <attributes>
         <attribute name="belongsTo" mandatory="true" datatype="entity" target="Year"/>
         <attribute name="month" mandatory="true" datatype="integer"/>
      </attributes>
   </entity>
   <entity name="Day">
      <attributes>
         <attribute name="belongsTo" mandatory="true" datatype="entity" target="Month"/>
         <attribute name="day" mandatory="true" datatype="integer"/>
      </attributes>
   </entity>
</entities>
<relationships>
   <relationship name="belongsTo" inverse="includes">
      <references>
         <reference subject="Year" object="Timeline"/>
      </references>
   </relationship>
   <relationship name="follows" inverse="precedes">
      <references>
         <reference subject="Day" object="Day"/>
         <reference subject="Month" object="Month"/>
         <reference subject="Year" object="Year"/>
      </references>
   </relationship>
</relationships>


Figure 4: Implicit entities and relationships for time handling


values of these types, the corresponding string is stored.
   Attributes of type entity denote 1:1 relationships between an instance of the current entity
and an instance of another entity (specified in the target attribute of the tag). So, they are stored
in the DB as arcs connecting the nodes corresponding to these two instances and having the
attribute name as type (note that in our proposed naming policy attribute names start with a
lowercase letter, just like relationship names). E.g., in Figure 3 the birthplace of an entity Person
would be modeled as a ‘bornIn’ attribute of type entity with entity Place as the target. In the
graph, an arc labeled ‘bornIn’ will be added from the Person node to the Place node.
   Finally, albeit Neo4j provides for temporal types, including Date, following [2] GraphBRAIN
models attributes of type date as 1:1 relationships to entities ‘Day’ (with year/month/day values),
‘Month’ (with year/month values), or ‘Year’ (with year values). This allows to specify dates at
different granularity, differently from Neo4j’s Date type. Neo4j provides functions for Date
truncation to Month or Year, but such truncations actually correspond to the first day of the
month or year and thus there is no way to distinguish whether a date like 2020/01/01 actually
refers to the specific day or is a truncation for the month (2020/01) or year (2020). Using attribute
‘belongsTo’ (of type entity), ‘Day’ nodes are connected to the corresponding ‘Month’ nodes,
�<?xml version="1.0"?>
<domain name="lam">
   <imports>
      <import schema="general"/>
   </imports>
   <entities> ... </entities>
   <relationships> ... </relationships>
</domain>


Figure 5: Excerpt of sample GBS file (focus on imports)


which in turn are connected to the corresponding ‘Year’ nodes, which are finally all connected
to the ‘Timeline’ nodes. Arcs of type ‘follows’ may be added and maintained between adjacent
days, months or years in the DB, so as to easily extract from the DB time intervals and associated
information. Figure 4 shows the portion of ontology defining temporal elements.
   Each GBS schema is meant to describe one domain. However, GraphBRAIN may apply
several schemas to one graph DB, each representing a specific perspective on the data in the
DB, and providing a partial view of its contents (e.g., in order to limit access to the DB contents
for some users or applications). Shared entities and relationships in those schemas act as
bridges that allow to connect the data from the various domains/perspectives, and enforce
cross-fertilization of the data, which is important for AI applications. GraphBRAIN considers
entities and relationships in different schemas as the same (i.e., shared) if they have the same
name. Shared entities and relationships may have different attributes in different schemas,
reflecting the different perspectives associated with the different domains. However, attributes
that are present in different domains must have the same datatype in all of them.
   Schemas may also be combined, provided that their elements (entities and relationships)
are compatible. By compatible we mean that elements having the same name in the different
schemas must be in the taxonomy of the same top element, and their attributes having the
same name must have the same datatype, too. The other attributes, or non-shared elements,
can be freely defined. Moreover, given two elements 𝐴 and 𝐵, it cannot happen that 𝐴 is a
specialization of 𝐵 in one domain, and vice versa in another. GraphBRAIN can integrate the
taxonomies even if intermediate sub-elements are present in either taxonomy but not in the other.
Schema composition allows to define more complex schemas that describe wider domains (e.g.,
elements in the ‘general’ schema might be reused by a schema concerning libraries, archives and
museums, or ‘lam’ domain). In addition to allowing reuse, this would also foster standardization
of the definitions. Schemas are combined in GBS using an optional section enclosed by tag
imports, located in the XML file before the entities and relationships sections. Each schema to
be imported is specified using tag import, with attribute schema for the name of the schema.
E.g., in Figure 5 the schema ‘lam’ imports schema ‘general’. Schemas are imported in the order
specified by the sequence of import tags. For each of them, its elements are progressively
added in the suitable places of the taxonomies, provided they are compatible with the existing
elements. Finally, elements defined in the entities or relationships sections of the importing
schema are added. Since it may happen that some elements of the imported schemas are not
needed in the current domain, delete tags (with attribute element to specify the element to be
deleted) allow to remove them from the overall ontology.
�4. Discussion & Conclusions
Graph DBs are a NoSQL kind of DBs which is gaining momentum for research and industrial
applications. Neo4j is the most relevant graph DB available nowadays, based on the LPG graph
model. Neo4j is schema-less, which ensures great flexibility, but at the cost of a lack of a clear
semantics for the graph contents, and of losing interpretability and interoperability of the data.
As a solution we propose the use of ontologies as schemas, which as a nice side-effect would
also enable high-level (logical) automated reasoning on the data in addition to standard DB data
manipulation. Since the standard approach to ontologies in Computer Science is incompatible
with the LPG model, we proposed the GraphBRAIN technology, which is specifically LPG-
oriented, while still having a mapping to standard ontological approaches. GraphBRAIN wraps
the graph DB: it takes as input a GBS schema and controls all interactions, allowing the external
applications to manipulate and consult only information items that are compliant with the
schema. GraphBRAIN allows many schemas to be applied to the same graph to express different
domains or perspectives on its content. Shared entities and relationships among these sthemas
enforce cross-fertilization of the knowledge from the corresponding domains.
   This paper described the GraphBRAIN schema formalism and the opportunities it provides.
Development and implementation of the technology is ongoing, and a prototype Web application
is available and running to comfortably build, browse and edit both the schemas expressed in
this format and the corresponding data (see http://193.204.187.73:8088/GraphBRAIN/, where
also the schema used as a running example in this paper can be seen in full) [18]. Research is
ongoing to further expand the expressive power of the GBS formalism, and to implement its
features. At the same time, schemas are being built and refined for different domains, and data
are being entered for them, so as to support several applications. The current prototype includes
schemas for the domains of ‘tourism’, ‘food’, ‘computing’ (concerning computing devices and
their history), and ‘lam’ (concerning libraries, archives and museums). Practical use cases in
these domains can be found in [19, 20]. Finally, an investigation of AI algorithms that may
leverage the power of the GraphBRAIN framework is being carried out.


References
 [1] B. Shao, H. Wang, Y. Li, Trinity: A distributed graph engine on a memory cloud, in:
     Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
     (SIGMOD’13), 2013, pp. 505–516.
 [2] I. Robinson, J. Webber, E. Eifrem, Graph Databases, 2nd ed., O’Reilly Media, 2015.
 [3] M. Rodriguez, P. Neubauer, Constructions from dots and lines, Bul. Am. Soc.Info. Sci. Tech.
     36 (2010) 35–41.
 [4] R. Studer, R. Benjamins, D. Fensel, Knowledge engineering: Principles and methods, Data
     & Knowledge Engineering 25 (1998) 161–198.
 [5] S. Sakr, A. Bonifati, H. Voigt, A. Iosup, K. Ammar, R. Angles, W. Aref, M. Arenas, M. Besta,
     P. A. Boncz, et al., The Future is Big Graphs: A Community View on Graph Processing
     Systems, Communications of the ACM 64 (2021) 62–71.
 [6] S. Rudolph, Foundations of Description Logics, in: Reasoning Web. Semantic Technologies
�     for the Web of Data: 7th International Summer School 2011, Tutorial Lectures, Springer,
     2011, pp. 76–136.
 [7] L. Ehrlinger, W. Wolfram, Towards a definition of knowledge graphs, in: SEMANTICS
     2016: Posters and Demos Track, volume 1695 of CEUR Workshop Proceedings, 2016.
 [8] B. Schrader, What’s the Difference Between an Ontology and a Knowledge Graph? (White
     Paper), Technical Report, Enterprise Knowledge, 2020.
 [9] S. Ferilli, Integration Strategy and Tool between Formal Ontology and Graph Database
     Technology, Electronics 10 (2021).
[10] M. Krötzsch, Ontologies for Knowledge Graphs?, in: Proceedings of the 30th International
     Workshop on Description Logics, volume 1879 of CEUR Workshop Proceedings, CEUR-
     WS.org, 2017. URL: http://ceur-ws.org/Vol-1879/invited2.pdf.
[11] M. Elbattah, M. Roushdy, M. Aref, A.-B. M. Salem, Large-scale ontology storage and
     query using graph database-oriented approach: The case of Freebase, in: 7th International
     Conference on Intelligent Computing and Information Systems (ICICIS), IEEE, 2015, pp.
     39–43.
[12] S. Staworko, I. Boneva, J. E. L. Gayo, S. Hym, E. G. Prud’Hommeaux, H. Solbrig, Complexity
     and Expressiveness of ShEx for RDF, in: 18th International Conference on Database Theory
     (ICDT 2015), 2015.
[13] G. Drakopoulos, A. Kanavos, P. Mylonas, S. Sioutas, D. Tsolis, Towards a framework
     for tensor ontologies over Neo4j: Representations and operations, in: 8th International
     Conference on Information, Intelligence, Systems & Applications, IISA 2017, Larnaca,
     Cyprus, August 27-30, 2017, IEEE, 2017, pp. 1–6.
[14] H. Chiba, R. Yamanaka, S. Matsumoto, G2GML: Graph to Graph Mapping Language for
     Bridging RDF and Property Graphs, in: The Semantic Web – ISWC 2020, Springer, Cham,
     2020, pp. 160–175.
[15] O. Hartig, Foundations to Query Labeled Property Graphs using SPARQL, in: Joint
     Proceedings of the 1st International Workshop On Semantics For Transport and the 1st
     International Workshop on Approaches for Making Data Interoperable co-located with 15th
     Semantics Conference (SEMANTiCS 2019), volume 2447 of CEUR Workshop Proceedings,
     CEUR-WS.org, 2019.
[16] B. Groz, A. Lemay, S. Staworko, P. Wieczorek, Inference of Shape Graphs for Graph
     Databases, in: 25th International Conference on Database Theory (ICDT 2022), Schloss
     Dagstuhl-Leibniz-Zentrum für Informatik, 2022.
[17] H. Lbath, A. Bonifati, R. Harmer, Schema inference for property graphs, in: EDBT
     2021-24th International Conference on Extending Database Technology, 2021, pp. 499–504.
[18] S. Ferilli, D. Redavid, The GraphBRAIN System for Knowledge Graph Management and
     Advanced Fruition, in: Foundations of Intelligent Systems, volume 12117 of LNAI, Springer,
     Berlin, Heidelberg, 2020, pp. 308–317.
[19] S. Ferilli, D. Redavid, An Ontology and a Collaborative Knowledge Base for History of
     Computing, in: 1st International Workshop on Open Data and Ontologies for Cultural
     Heritage (ODOCH-2019), volume 2375 of CEUR Workshop Proceedings, 2019, pp. 49–60.
[20] S. Ferilli, D. Redavid, An Ontology and Knowledge Graph Infrastructure for Digital Library
     Knowledge Representation, in: Digital Libraries: The Era of Big Data and Data Science,
     volume 1177 of CCIS, Springer, Berlin, Heidelberg, 2020, pp. 47–61.
�
Difference between revisions of "Vol-3194/paper31"

Revision as of 17:57, 30 March 2023

Paper

LPG-based Ontologies as Schemas for Graph DBs

Navigation menu

Search