Vol-3194/paper61

From BITPlan ceur-ws Wiki
Jump to navigation Jump to search

Paper

Paper
edit
description  
id  Vol-3194/paper61
wikidataid  Q117344931โ†’Q117344931
title  Investigating Monotone Abstractions
pdfUrl  https://ceur-ws.org/Vol-3194/paper61.pdf
dblpUrl  https://dblp.org/rec/conf/sebd/CimaCLP22
volume  Vol-3194โ†’Vol-3194
session  โ†’

Investigating Monotone Abstractions

load PDF

Investigating Monotone Abstractions
Gianluca Cima2 , Marco Console1 , Maurizio Lenzerini1 and Antonella Poggi1
1
    Sapienza Universitร  di Roma
2
    CNRS & University of Bordeaux


                  Abstract
                  In Ontology-based Data Management (OBDM), an abstraction of a source query q is a query over the
                  ontology capturing the semantics of q in terms of the concepts and the relations available in the ontology.
                  Since a perfect characterisation of a source query may not exist, the notions of best sound and complete
                  approximations of an abstraction have been introduced and studied in the typical OBDM context, i.e.,
                  in the case where the ontology is expressed in DL-Lite, and source queries are expressed as unions of
                  conjunctive queries (UCQs). Interestingly, if we restrict our attention to abstractions expressed as UCQs,
                  even best approximations of abstractions are not guaranteed to exist. Thus, a natural question to ask is
                  whether such limitations affect even larger classes of queries. In this paper, we answer this fundamental
                  question for an essential class of queries, namely the class of monotone queries. We define a monotone
                  query language based on disjunctive Datalog enriched with an epistemic operator, and show that its
                  expressive power suffices for expressing the best approximations of monotone abstractions of UCQs.

                   Keywords
                  Abstraction, Disjunctive Datalog, Monotone queries, Epistemic queries




1. Introduction
In Ontology-based Data Management (OBDM) [1], an ontology, i.e., a formal, logic-based repre-
sentation of a domain of interest, is used to provide a high-level conceptual tool for accessing
and managing the data sources of an information system. Suitable mappings declaratively
specify the relationship between the data at the sources and the elements in the ontology, and
this enables the user to carry out many relevant tasks on data through the lens of the ontology
[2, 3].
   Recent papers [4, 5, 6, 7, 8] address a novel issue in OBDM: starting from a query ๐‘ž๐’ฎ expressed
over the sources, the goal is to find a so-called abstraction of ๐‘ž๐’ฎ [9], i.e., an ontology-based
characterization of ๐‘ž๐’ฎ expressed in terms of the ontology elements, and whose answers coincide
with the answer to the original query, modulo the ontology and the mapping. We encountered
the need of abstraction during a joint project with a public statistical research institute. The
instituteโ€™s departments must publish subsets of the data they gather in the form of semantically
described linked open data. To compute the content of the datasets the departments execute
suitable queries over the data sources mapped to a shared ontology. Notably, when the dataset is
published, it must be documented through a SPARQL query expressed in terms of the ontology.

SEBD 2022: The 30th Italian Symposium on Advanced Database Systems, June 19-22, 2022, Tirrenia (PI), Italy
" gianluca.cima@u-bordeaux.fr (G. Cima); console@diag.uniroma1.it (M. Console); lenzerini@diag.uniroma1.it
(M. Lenzerini); poggi@diag.uniroma1.it (A. Poggi)
    ยฉ 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR Workshop Proceedings (CEUR-WS.org)
๏ฟฝThis task is currently done manually. The notion of abstraction perfectly captures this scenario
and provides the formal tool for automating the process: given the query over the sources
computing the content of the dataset, the abstraction of such query with respect to the mapping
and the ontology is exactly the SPARQL query to be associated to the open dataset. Besides
the above use case, abstraction can be the appropriate tool in various scenarios. For additional
insights we refer to the references mentioned above.
   The first investigations on abstraction appear in [4, 5]. Both papers point out that the โ€œperfectโ€
abstraction of a union of conjunctive queries (UCQ) expressed over the data source not always
exists, and present algorithms for computing such abstraction in the case where it both exists,
and can be expressed as a UCQ over the ontology. In [4, 6] the notion of (sound and complete)
approximations of the perfect abstraction is introduced, exactly to cope with situations in which
perfectness cannot be achieved. Moreover, both papers make it clear that, for a given class
of queries ๐ถ, one is probably interested in two specific forms of approximations, called ๐ถ-
minimally complete and ๐ถ-maximally sound abstractions. Based on these notions, [6] presents
a thorough analysis of the verification problem (check whether a given query is a complete
or sound abstraction) and the computation problem of UCQ-minimally complete and UCQ-
maximally sound abstractions of UCQ source queries in OBDM systems based on DL-Lite. In [7]
the computation problem is studied in the context of a specific class of non-monotone queries
for expressing abstractions, and it is shown that this class can provide abstractions that are
better than the one in the UCQ class.
   Thus, with the exception of [7], all the results on abstractions have been obtained under the
assumptions that abstractions are expressed as UCQs over the ontology, and many of them
originate from the observations that best approximations in the UCQ class are not guaranteed
to exist. Thus, a natural issue to investigate is whether such limitations affect even larger
classes of queries for expressing abstractions. The main goal of this paper is to address the
following question: do approximations of perfect abstraction that are best in a fundamental
class of queries, namely the class of monotone queries, always exist? Obviously, a related goal
is to derive algorithms for computing approximations of abstractions that are best in the class
of monotone queries, if they exist. Note that the class of monotone queries includes queries
expressible in First-Order Logic and is therefore extremely important.
   In this paper we answer positively to the above-mentioned fundamental question. More
specifically, the contributions of the paper can be summarized as follows. We present a general
framework for abstraction in OBDM, based on the definition of queries as functions from the
logical models of OBDM systems to sets of tuples. The framework includes a new monotone
query language, called Datalogโˆจ    K , based on disjunctive Datalog enriched with inequalities and
an epistemic operator (Section 2). We consider a scenario where the OBDM specification ๐ฝ is
based on DL-LiteRDFS (i.e., the fragment of RDFS expressible in Description Logic), and show that,
in the considered scenario, for any source UCQ ๐‘ž๐’ฎ , the best (sound or complete) approximations
of the ๐ฝ-abstraction of ๐‘ž๐’ฎ in the class of monotone queries always exists and can be expressed
in Datalogโˆจ K (Section 3). As a consequence, if the perfect abstraction exists and is in the class of
monotone queries, then it can be expressed in Datalogโˆจ     K (Section 4).
   This paper is an extended abstract of [10]. Hence, while we assume basic knowledge about
databases [11] and Description Logics (DL) [12], for specific concepts and notations, we refer to
the Preliminaries section of [10].
๏ฟฝ2. Framework
In what follows, we implicitly refer to an OBDM specification ๐ฝ = โŸจ๐’ช, ๐’ฎ, โ„ณโŸฉ, and when we
denote a query by ๐‘ž๐’ช (resp., ๐‘ž๐’ฎ ) we mean that the query is a query for ๐ฝ (resp., a source query),
i.e., is over the signature of the ontology ๐’ช (resp., the schema ๐’ฎ). We follow [6] for the basic
definitions related to abstraction.
                                                         ๐ฝ,๐ท
   We say that ๐‘ž๐’ช is a perfect ๐ฝ-abstraction of ๐‘ž๐’ฎ if ๐‘ž๐’ช     = ๐‘ž๐’ฎ๐ท , for each ๐’ฎ-database ๐ท consistent
with ๐ฝ.
   As the condition for an ontology query to be a perfect abstraction of a source query is a strong
one, it might be very well the case that a perfect abstraction of a source query does not exist. It
is then reasonable to consider weaker notions, such as sound or complete approximations, of
perfectness.
                                                                                 ๐ฝ,๐ท         ๐ฝ,๐ท
   We say that ๐‘ž๐’ช is a complete (resp. sound) ๐ฝ-abstraction of ๐‘ž๐’ฎ if ๐‘ž๐’ฎ๐ท โŠ† ๐‘ž๐’ช        (resp. ๐‘ž๐’ช   โŠ† ๐‘ž๐’ฎ๐ท ),
for each ๐’ฎ-database ๐ท consistent with ๐ฝ.
   Obviously, we might be interested in complete or sound abstractions that approximate ๐‘ž๐’ฎ at
best, at least in the context of a specific class of queries. If โ„’๐’ช is a class of queries, we say that a
query ๐‘ž๐’ช โˆˆ โ„’๐’ช is an โ„’๐’ช -minimally complete (resp., โ„’๐’ช -maximally sound) ๐ฝ-abstraction of ๐‘ž๐’ฎ
if ๐‘ž๐’ช is a complete (resp., sound) ๐ฝ-abstraction of ๐‘ž๐’ฎ and there is no query ๐‘ž๐’ช      โ€ฒ โˆˆ โ„’ such that
                                                                                            ๐’ช
๐‘ž๐’ชโ€ฒ is a complete (resp., sound) ๐ฝ-abstraction of ๐‘ž and ๐‘ž โ€ฒ โŠ ๐‘ž (resp., ๐‘ž โŠ ๐‘ž โ€ฒ ).
                                                       ๐’ฎ        ๐’ช   ๐ฝ ๐’ช            ๐’ช   ๐ฝ ๐’ช
   We now introduce a new language, called Datalogโˆจ          K , for expressing queries over OBDM
specifications. The language is based on disjunctive Datalog, and is used in this paper for
expressing abstractions. The basic component of a Datalogโˆจ           K query is a rule. Assume two
disjoint and countably infinite sets of predicates โ„ฐ and ๐’ซ, called extensional and intensional,
respectively. A Datalogโˆจ  K rule has one of the following forms:

    โ€ข The typical form of disjunctive Datalog, i.e.,

                                          ยฏ ) โ†’ ๐›ผ1 (๐‘ฅ
                                        ๐›พ(๐‘ฅ         ยฏ 1 ) โˆจ . . . โˆจ ๐›ผ๐‘› (๐‘ฅ
                                                                        ยฏ๐‘›)                                 (1)

      where ๐›พ(๐‘ฅ ยฏ ) is a conjunction of relational atoms on the predicates of ๐’ซ with โƒ—๐‘ฅ as variables,
                                    ยฏ ๐‘– ) is a single relational atom whose predicate is in ๐’ซ such that
      and for ๐‘– = 1, . . . , ๐‘›, ๐›ผ๐‘– (๐‘ฅ
      ยฏ๐‘– โŠ† ๐‘ฅ
      ๐‘ฅ    ยฏ,
    โ€ข A new form specified as follows

                                     ยฏ , ๐‘งยฏ) โˆง ๐œ‰(๐‘ฅ
                             K(โˆƒ๐‘งยฏ.๐œ‘(๐‘ฅ           ยฏ )) โ†’ ๐œ“1 (๐‘ฅ
                                                            ยฏ 1 ) โˆจ . . . โˆจ ๐œ“๐‘› (๐‘ฅ
                                                                                ยฏ๐‘›)                         (2)

      where ๐œ‘ is a conjunction of relational atoms over โ„ฐ, ๐œ‰(๐‘ฅ           ยฏ ) is a conjunction of inequality
      atoms involving only variables from ๐‘ฅ        ยฏ and for ๐‘— = 1, . . . , ๐‘›, ๐œ“๐‘— = โˆƒ๐‘ฆยฏ๐‘— .๐›พ๐‘— (๐‘ฅ
                                                                                              ยฏ ๐‘— , ๐‘ฆยฏ๐‘— ), where
      ๐›พ๐‘— is a conjunction of relational atoms on ๐’ซ. When ๐‘ฅ        ยฏ ๐‘— contains only variables ๐‘ฅ   ยฏ occurring
      in ๐œ‘(๐‘ฅยฏ , ๐‘งยฏ), for ๐‘— = 1, . . . , ๐‘›, we say that the rule is safe.
   An ๐‘›-ary Datalogโˆจ                                                                    โˆจ
                    K query ๐‘ž๐’ช over an OBDM specification ๐ฝ is a finite set of DatalogK rules
whose extensional predicates coincide with the alphabet of ๐’ช, and whose intensional predicates
include a special ๐‘›-ary predicate ๐ด๐‘›๐‘ . We say that ๐‘ž๐’ช is safe if all of its rules are safe. The
semantics of ๐‘ž๐’ช is provided relative to an OBDM system. Given an OBDM system โŸจ๐ฝ, ๐ทโŸฉ, an
interpretation for ๐‘ž๐’ช is a pair ๐ผ = (๐‘š๐‘œ๐‘‘(โŸจ๐ฝ, ๐ทโŸฉ), ๐‘“ ), where ๐‘“ is a first-order interpretation
๏ฟฝ(with domain Const) for the predicates in ๐’ซ. As usual, we may also see ๐‘“ as the set of facts
{๐‘(๐‘ยฏ) | ยฏ๐‘ โˆˆ ๐‘๐‘“ }. We now define when ๐ผ satisfies a Datalogโˆจ
                                                            K rule.

    โ€ข ๐ผ satisfies a rule of the form (1) if the first-order formula โˆ€๐‘ฅ      ยฏ ) โ†’ ๐›ผ1 (๐‘ฅ
                                                                       ยฏ .๐›พ(๐‘ฅ         ยฏ 1 )โˆจ. . .โˆจ๐›ผ๐‘› (๐‘ฅ
                                                                                                      ยฏ๐‘›)
      is true in ๐‘“ ,
    โ€ข ๐ผ satisfies a rule of the form (2) if for all tuples ยฏ๐‘ of constants in Const, the fact that the
      first-order formula โˆƒ๐‘งยฏ.๐œ‘(๐‘ยฏ, ๐‘งยฏ) โˆง ๐œ‰(๐‘ยฏ) is satisfied by every model in mod (โŸจ๐ฝ, ๐ทโŸฉ) implies
      that ๐‘“ satisfies the first-order formula โˆƒ๐‘ฆยฏ๐‘— .๐›พ๐‘— (๐‘ยฏ๐‘— , ๐‘ฆยฏ๐‘— ), for some ๐‘— = 1, . . . , ๐‘›. Observe
      that, if the rule is unsafe, ยฏ๐‘๐‘— may contain constants of Const that do not occur in ยฏ๐‘.
  An interpretation ๐ผ for ๐‘ž๐’ช is called a model of ๐‘ž๐’ช if all the rules of ๐‘ž๐’ช are satisfied by ๐ผ. Finally,
we define the notion of answers to an ๐‘›-ary Datalogโˆจ   K query ๐‘ž๐’ช w.r.t. an OBDM system โŸจ๐ฝ, ๐ทโŸฉ,
            ๐ฝ,๐ท
denoted by ๐‘ž๐’ช   , as follows: {๐‘ยฏ โˆˆ Dom(๐ฝ, ๐ท, ๐‘ž๐’ช )๐‘› | ยฏ๐‘ โˆˆ ๐ด๐‘›๐‘ ๐‘“ for each model (๐‘š๐‘œ๐‘‘(โŸจ๐ฝ, ๐ทโŸฉ), ๐‘“ ) of ๐‘ž๐’ช }.
  Let ๐ฝ = โŸจ๐’ช, ๐’ฎ, โ„ณโŸฉ be an OBDM specification such that ๐’ช = โˆ…, ๐’ฎ = {๐‘ 1 /2, ๐‘ 2 /1, ๐‘ 3 /1} and
โ„ณ = {m1 , m2 , m3 , m4 }, where:
    m1 :     โˆ€๐‘ฅ1 , ๐‘ฅ2 .๐‘ 1 (๐‘ฅ1 , ๐‘ฅ2 )   โ†’     E(๐‘ฅ1 , ๐‘ฅ2 )
    m2 :     โˆ€๐‘ฅ.๐‘ 1 (๐‘ฅ, ๐‘ฅ)              โ†’     SN(๐‘ฅ)
    m3 :     โˆ€๐‘ฅ.๐‘ 2 (๐‘ฅ)                 โ†’     โˆƒ๐‘ง.E(๐‘ฅ, ๐‘ง)
    m4 :     โˆ€๐‘ฅ.๐‘ 3 (๐‘ฅ)                 โ†’     E(๐‘ฅ, ๐‘ฅ)
The mapping โ„ณ of ๐ฝ establishes how predicates in the schema ๐’ฎ relate to the ontology
predicates E (which stands for edge) and SN (which stands for special node).
   Let us now consider the following safe Datalogโˆจ     K query ๐‘ž๐’ช over ๐ฝ. ๐‘ž๐’ช returns all the pairs
(๐‘ฃ1 , ๐‘ฃ2 ) of special nodes that are known to be distinct and such that there is a path from ๐‘ฃ1 to ๐‘ฃ2
passing only for nodes known to be special:
    K(E(๐‘ฅ1 , ๐‘ฅ2 ) โˆง SN(๐‘ฅ1 ) โˆง SN(๐‘ฅ2 ))           โ†’     ๐‘‡1 (๐‘ฅ1 , ๐‘ฅ2 )
    K(SN(๐‘ฅ1 ) โˆง SN(๐‘ฅ2 ) โˆง ๐‘ฅ1 ฬธ= ๐‘ฅ2 )             โ†’     ๐‘‡2 (๐‘ฅ1 , ๐‘ฅ2 )
    ๐‘‡1 (๐‘ฅ1 , ๐‘ฆ) โˆง ๐‘‡1 (๐‘ฆ, ๐‘ฅ2 )                    โ†’     ๐‘‡1 (๐‘ฅ1 , ๐‘ฅ2 )
    ๐‘‡1 (๐‘ฅ1 , ๐‘ฅ2 ) โˆง ๐‘‡2 (๐‘ฅ1 , ๐‘ฅ2 )                โ†’     Ans(๐‘ฅ1 , ๐‘ฅ2 )

   The following proposition shows that Datalogโˆจ    K is a monotone query language.
   Every Datalogโˆจ K query  over ๐ฝ is in M ๐ฝ , for every OBDM specification ๐ฝ.

   The semantics should make it clear that K is the knowledge operator in the S5 epistemic
logic: the formula K๐ด should be read as โ€œ๐ด is known (i.e., logically implied) by the systemโ€
[13]. Therefore, when accessing the information modeled by โŸจ๐ฝ, ๐ทโŸฉ, a Datalogโˆจ     K query extracts
what is known by the system, and this characteristic is crucial for not falling into undecidability
resulting from using Datalog rules jointly with Description Logics (see [14, 15]), as stated in the
following proposition. Let ฮฃ be an OBDM system. Answering safe Datalogโˆจ        K queries w.r.t. ฮฃ is
decidable if and only if answering CQs w.r.t. ฮฃ is decidable .
                                                             1

   Although our framework is general enough to consider any DL for expressing ontologies
and any query language for expressing source queries, in the rest of this paper we will carry
out our investigation in the following setting: (๐‘–) ontologies are expressed in DL-LiteRDFS , and
(๐‘–๐‘–) source queries are expressed as UCQs.
    1
    With answering we implicitly refer to the associated recognition problem, i.e., check whether a tuple is in the
answer to a query.
๏ฟฝ   At this point, one may wonder whether Datalogโˆจ  K is the right language to express monotone
abstractions in this setting. While a thorough analysis of the language is outside the scope of
the present paper, the following proposition provides a positive answer to this question, at least
from the computational point of view.
   In our setting, (๐‘–) answering safe DatalogโˆจK queries is in coNP in data complexity, and (๐‘–๐‘–)
there exists an OBDM specification ๐ฝ and a CQ ๐‘ž๐’ฎ such that, given an ๐’ฎ-database ๐ท, answering
the M-maximally sound ๐ฝ-abstraction of ๐‘ž๐’ฎ is coNP-hard in data complexity.
   To ease the presentation, from now on we assume that mappings and source queries do not
mention constants. However, all our results can be straightforwardly adapted to the case where
constants are allowed.


3. Monotone approximations of abstractions
In this section we investigate the problem of the existence of M-minimally complete and M-
maximally sound abstractions, in the restricted setting specified at the end of the previous
section. In particular, we start by showing that M-minimally complete abstractions of UCQ
source queries always exist and can be expressed in Datalogโˆจ            K . Then, because of the lack of
space, we will just mention the corresponding result on M-maximally sound abstractions.
   Given a CQ ๐‘ž๐’ฎ = {๐‘ฅ                 ยฏ , ๐‘ฆยฏ)}, the subroutine SaturateQ(๐‘ž๐’ฎ ) computes a UCQฬธ= in the
                            ยฏ | โˆƒ๐‘ฆยฏ.๐œ‘(๐‘ฅ
following way: for each possible unifier ๐œ‡ on the variables in ๐‘ฅ       ยฏ โˆช ๐‘ฆยฏ such that ๐œ‡(๐‘ฅ) โˆˆ ๐‘ฅ    ยฏ for each
๐‘ฅโˆˆ๐‘ฅ    ยฏ , SaturateQ(๐‘ž๐’ฎ ) contains a query obtained from ๐œ‡(๐‘ž๐’ฎ ) by adding the inequality atom
๐‘ก1 ฬธ= ๐‘ก2 for each pair of distinct variables ๐‘ก1 , ๐‘ก2 occurring in ๐œ‡(๐‘ž๐’ฎ ). For a UCQ ๐‘ž๐’ฎ , we denote
by SaturateQ(๐‘ž๐’ฎ ) the UCQฬธ= obtained by applying SaturateQ(๐‘ž) to each disjunct ๐‘ž of ๐‘ž๐’ฎ . We
write each CQฬธ= ๐‘ž generated by SaturateQ(๐‘ž๐’ฎ ) as ๐‘ž = {๐‘ฅ         ยฏ | โˆƒ๐‘ฆยฏ.๐œ‘(๐‘ฅยฏ , ๐‘ฆยฏ) โˆง ๐œ‰(๐‘ฅ
                                                                                       ยฏ , ๐‘ฆยฏ)}, where ๐œ‘(๐‘ฅ  ยฏ , ๐‘ฆยฏ)
and ๐œ‰(๐‘ฅ   ยฏ , ๐‘ฆยฏ) are the conjunctions of the relational atoms over ๐’ฎ and of the inequality atoms,
respectively, occurring in the body of ๐‘ž.
   Moreover, for an OBDM specification ๐ฝ = โŸจ๐’ช, ๐’ฎ, โ„ณโŸฉ and a CQฬธ= ๐‘ž = {๐‘ฅ                       ยฏ | โˆƒ๐‘ฆยฏ.๐œ‘(๐‘ฅ
                                                                                                       ยฏ , ๐‘ฆยฏ) โˆง
                                                                 โˆจ
  ยฏ , ๐‘ฆยฏ)} over ๐’ฎ, we denote by ๐‘Ÿ๐‘ž the following DatalogK rule of form (2):
๐œ‰(๐‘ฅ

                                                    ยฏ , ๐’ดยฏ )) โ†’ Ans(๐‘ฅ
                                ๐‘Ÿ๐‘ž = K(โˆƒ๐‘งยฏ.โ„ณ(๐‘ž) โˆง ๐œ‰(๐‘ฅ               ยฏ ),

where (i) โ„ณ(๐‘ž) is computed by simply ignoring the inequality atoms and chasing the set of
relational atoms occurring in the body of ๐‘ž; (ii) ๐’ดยฏ โŠ† ๐‘ฆยฏ is the subset of the existential variables of
๐‘ž occurring in โ„ณ(๐‘ž); (iii) ๐‘งยฏ are the fresh variables introduced when computing โ„ณ(๐‘ž); and (iv)
๐œ‰(๐‘ฅยฏ , ๐’ดยฏ ) is the conjunction of the inequality atoms obtained from ๐œ‰(๐‘ฅ ยฏ , ๐‘ฆยฏ) by removing all those
atoms of the form ๐‘ฆ ฬธ= ๐‘ก and ๐‘ก ฬธ= ๐‘ฆ in which ๐‘ฆ is an existential variable occurring in ๐‘ฆยฏ but not in
ยฏ (i.e., not in โ„ณ(๐‘ž)) and ๐‘ก is any other possible variable. Observe that the epistemic operator
๐’ด
is exploited to bind the existential variables coming from ๐‘ž. This is achieved by pushing the
subset ๐’ดยฏ of the existential variables ๐‘ฆยฏ of ๐‘ž occurring in โ„ณ(๐‘ž) inside the K operator.
   We are now ready to present the algorithm M-MinComplete for computing the M-minimally
complete ๐ฝ-abstractions. Given an OBDM specification ๐ฝ = โŸจ๐’ช, ๐’ฎ, โ„ณโŸฉ and a UCQ ๐‘ž๐’ฎ over ๐’ฎ
such that SaturateQ(๐‘ž๐’ฎ ) = ๐‘ž1 โˆช . . . โˆช ๐‘ž๐‘› , M-MinComplete(๐ฝ, ๐‘ž๐’ฎ ) outputs the Datalogโˆจ       K query
๐‘ž๐’ช = {๐‘Ÿ๐‘ž1 , . . . , ๐‘Ÿ๐‘ž๐‘› } over ๐ฝ.
๏ฟฝ    Consider the OBDM specification ๐ฝ illustrated in Example 2 and the CQ ๐‘ž๐’ฎ = {(๐‘ฅ) |
โˆƒ๐‘ฆ.๐‘ 1 (๐‘ฅ, ๐‘ฆ)} over ๐’ฎ. One can verify that M-MinComplete(๐ฝ, ๐‘ž๐’ฎ ) returns the following safe
Datalogโˆจ K query ๐‘ž๐’ช over ๐ฝ asking for all those nodes ๐‘ฃ such that either ๐‘ฃ is connected to a node
๐‘ฃ โ€ฒ known to be different from ๐‘ฃ or ๐‘ฃ is a special node with a self-loop:
    K(E(๐‘ฅ, ๐‘ฆ) โˆง ๐‘ฅ ฬธ= ๐‘ฆ)        โ†’   Ans(๐‘ฅ)
    K(E(๐‘ฅ, ๐‘ฅ) โˆง SN(๐‘ฅ))         โ†’   Ans(๐‘ฅ)
Note that ๐‘ž๐’ช is a better complete approximation than the query {(๐‘ฅ) | โˆƒ๐‘ฆ.E(๐‘ฅ, ๐‘ฆ)}, which is
the UCQ-minimally complete ๐ฝ-abstraction of ๐‘ž๐’ฎ [6].
   M-MinComplete(๐ฝ, ๐‘ž๐’ฎ ) terminates and returns the unique (up to ๐ฝ-equivalence) M-minimally
complete ๐ฝ-abstraction of ๐‘ž๐’ฎ . Furthermore, we observe that the result of M-MinComplete(๐ฝ, ๐‘ž๐’ฎ )
is independent from the assertions occurring in the ontology of the OBDM specification ๐ฝ.
Similar results can be obtained for OBDM specifications based on more expressive Horn DL
ontologies.
   Before concluding this section, we observe that M-MinComplete(๐ฝ, ๐‘ž๐’ฎ ) may return unsafe
Datalogโˆจ  K queries. Nevertheless, these queries enjoy nice computational properties in our
setting.
   Let ๐‘ž๐’ช be a Datalogโˆจ  K query with only rules of the form K(โˆƒ๐‘ง        ยฏ , ๐‘งยฏ) โˆง ๐œ‰(๐‘ฅ
                                                                     ยฏ.๐œ‘(๐‘ฅ           ยฏ )) โ†’ Ans(๐‘ฅ
                                                                                                ยฏ ๐‘Ž ).
In our setting, (i) answering ๐‘ž๐’ช is in PTime in data complexity, and (ii) if ๐‘ž๐’ช is safe, then it is
                                                      ๐ฝ,๐ท
possible to compute a UCQฬธ= ๐‘ž๐‘Ÿ over ๐’ฎ such that ๐‘ž๐’ช        = ๐‘ž๐‘Ÿ๐ท , for each ๐’ฎ-database ๐ท.
   As for best sound approximations of abstractions, it can be shown (cf. details in [10]) that,
similarly to the case of best complete abstractions, an algorithm exists that, given a UCQ ๐‘ž๐’ฎ ,
terminates and computes the unique (up to ๐ฝ-equivalence) M-maximally sound ๐ฝ-abstraction
of ๐‘ž๐’ฎ , expressed as a Datalogโˆจ  K query. Thus, we have the following:
   Given a UCQ ๐‘ž๐’ฎ , both the unique (up to ๐ฝ-equivalence) M-minimally complete and the
M-maximally sound ๐ฝ-abstraction of ๐‘ž๐’ฎ always exist and can be expressed as Datalogโˆจ        K queries.
Consider the OBDM specification ๐ฝ illustrated in Example 2. One can verify that the following set
โ„› of safe Datalogโˆจ  K rules is the unique (up to ๐ฝ-equivalence) M-maximally sound ๐ฝ-abstraction
of ๐‘ž๐’ฎ :
  K(E(๐‘ฅ1 , ๐‘ฅ2 ) โˆง ๐‘ฅ1 ฬธ= ๐‘ฅ2 )   โ†’     ๐‘ 1 (๐‘ฅ1 , ๐‘ฅ2 )
  K(E(๐‘ฅ, ๐‘ฅ))                   โ†’     ๐‘ 3 (๐‘ฅ) โˆจ ๐‘ 1 (๐‘ฅ, ๐‘ฅ)
  K(โˆƒ๐‘ง.E(๐‘ฅ, ๐‘ง))                โ†’     ๐‘ 2 (๐‘ฅ) โˆจ โˆƒ๐‘ฆ.๐‘ 1 (๐‘ฅ, ๐‘ฆ) โˆจ ๐‘ 3 (๐‘ฅ)
  K(SN(๐‘ฅ))                     โ†’     ๐‘ 1 (๐‘ฅ, ๐‘ฅ)


4. Perfect Abstractions
It follows from Theorem 3 that either the perfect abstraction of a source UCQ can be expressed
in Datalogโˆจ K , or it cannot be expressed as a monotone query (if it exists at all). We now present
an algorithm that, given an OBDM specification ๐ฝ and a source UCQ ๐‘ž๐’ฎ , returns the perfect
๐ฝ-abstraction of ๐‘ž๐’ฎ , if and only if it exists and is in M. To this aim, we make use of Proposition 3,
and refer to the ๐‘ž๐‘Ÿ defined in that proposition as the rewriting of ๐‘ž๐’ช w.r.t. ๐ฝ.
   Our algorithm, that we call M-Perfect, goes as follows. Given an OBDM specification
๐ฝ = โŸจ๐’ช, ๐’ฎ, โ„ณโŸฉ and a UCQ ๐‘ž๐’ฎ over ๐’ฎ as input: if (i) ๐‘ž๐’ช = M-MinComplete(๐ฝ, ๐‘ž๐’ฎ ) is safe and
๏ฟฝ(ii) ๐‘ž๐‘Ÿ โŠ‘๐’ฎ ๐‘ž๐’ฎ ; then return ๐‘ž๐’ช ; otherwise, report โ€œno perfect ๐ฝ-abstraction of ๐‘ž๐’ฎ is in Mโ€.
    In Example 3, M-Perfect(๐ฝ, ๐‘ž๐’ฎ ) returns ๐‘ž๐’ช , which is the perfect ๐ฝ-abstraction of ๐‘ž๐’ฎ .
    We conclude this section by establishing termination and correctness of the M-Perfect
algorithm.
    M-Perfect(๐ฝ, ๐‘ž๐’ฎ ) terminates and returns the unique (up to ๐ฝ-equivalence) perfect ๐ฝ-abstraction
of ๐‘ž๐’ฎ if and only if such an abstraction can be expressed in M.


5. Conclusion
We presented a thorough study of monotone abstractions of UCQs in OBDM systems. We
proved that best approximations of such abstractions always exist and introduced a query
language, Datalogโˆจ K , that captures them. Directions for future work are many. In the context
of monotone abstractions, we would like to investigate the case of more expressive ontology
languages, e.g., DL-Liteโ„› , as well as more expressive source query languages, e.g., unions of
conjunctive queries with inequalities and disjunctive Datalog. Finally, the problem of checking
whether given best approximations are expressible in simpler and more user friendly languages
remains open.


Acknowledgements
This work has been partially supported by the ANR AI Chair INTENDED (ANR-19-CHIA-0014),
by MIUR under the PRIN 2017 project โ€œHOPEโ€ (prot. 2017MMJJRE), and by the EU under the
H2020-EU.2.1.1 project TAILOR, grant id. 952215.

References
 [1] A. Poggi, D. Lembo, D. Calvanese, G. De Giacomo, M. Lenzerini, R. Rosati, Link-
     ing data to ontologies, Journal on Data Semantics X (2008) 133โ€“173. doi:10.1007/
     978-3-540-77688-8\_5.
 [2] M. Lenzerini, Managing data through the lens of an ontology, AI Magazine 39 (2018)
     65โ€“74.
 [3] G. De Giacomo, D. Lembo, M. Lenzerini, A. Poggi, R. Rosati, Using ontologies for semantic
     data integration, in: A Comprehensive Guide Through the Italian Database Research Over
     the Last 25 Years, 2018, pp. 187โ€“202.
 [4] G. Cima, Preliminary results on ontology-based open data publishing, in: Proceedings of
     the Thirtieth International Workshop on Description Logics (DL 2017), volume 1879 of
     CEUR Electronic Workshop Proceedings, 2017.
 [5] C. Lutz, J. Marti, L. Sabellek, Query expressibility and verification in ontology-based data
     access, in: Proceedings of the Sixteenth International Conference on the Principles of
     Knowledge Representation and Reasoning (KR 2018), 2018, pp. 389โ€“398.
 [6] G. Cima, M. Lenzerini, A. Poggi, Semantic characterization of data services through
     ontologies, in: Proceedings of the Twenty-Eighth International Joint Conference on
     Artificial Intelligence (IJCAI 2019), 2019, pp. 1647โ€“1653.
๏ฟฝ [7] G. Cima, M. Lenzerini, A. Poggi, Non-monotonic ontology-based abstractions of data
     services, in: Proceedings of the Seventeenth International Conference on Principles of
     Knowledge Representation and Reasoning (KR 2020), 2020, pp. 243โ€“252.
 [8] G. Cima, M. Console, M. Lenzerini, A. Poggi, Abstraction in Data Integration, in: Proceed-
     ings of the Thirty-Sixth Annual ACM/IEEE Symposium on Logic in Computer Science
     (LICS 2021), 2021, pp. 1โ€“11.
 [9] G. Cima, Abstraction in Ontology-based Data Management, Ph.D. thesis, Sapienza Univer-
     sity of Rome, 2020.
[10] G. Cima, M. Console, M. Lenzerini, A. Poggi, Monotone abstractions in ontology-based
     data management, in: Proceedings of the Thirty-Sixth AAAI Conference on Artificial
     Intelligence (AAAI 2022), 2022.
[11] S. Abiteboul, R. Hull, V. Vianu, Foundations of Databases, Addison Wesley Publ. Co., 1995.
[12] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, P. F. Patel-Schneider (Eds.), The Descrip-
     tion Logic Handbook: Theory, Implementation and Applications, Cambridge University
     Press, 2003.
[13] H. J. Levesque, G. Lakemeyer, The Logic of Knowledge Bases, The MIT Press, 2001.
[14] A. Y. Levy, M.-C. Rousset, Combining Horn rules and description logics in CARIN, Artificial
     Intelligence 104 (1998) 165โ€“209.
[15] D. Calvanese, R. Rosati, Answering recursive queries under keys and foreign keys is unde-
     cidable, in: Proceedings of the Tenth International Workshop on Knowledge Representation
     meets Databases (KRDB 2003), volume 79 of CEUR Electronic Workshop Proceedings, 2003.
๏ฟฝ