Vol-3194/paper61

From BITPlan ceur-ws Wiki
Revision as of 18:57, 30 March 2023 by Wf (talk | contribs) (edited by wikiedit)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Paper

Paper
edit
description  
id  Vol-3194/paper61
wikidataid  Q117344931→Q117344931
title  Investigating Monotone Abstractions
pdfUrl  https://ceur-ws.org/Vol-3194/paper61.pdf
dblpUrl  https://dblp.org/rec/conf/sebd/CimaCLP22
volume  Vol-3194→Vol-3194
session  →

Investigating Monotone Abstractions

load PDF

Investigating Monotone Abstractions
Gianluca Cima2 , Marco Console1 , Maurizio Lenzerini1 and Antonella Poggi1
1
    Sapienza Università di Roma
2
    CNRS & University of Bordeaux


                  Abstract
                  In Ontology-based Data Management (OBDM), an abstraction of a source query q is a query over the
                  ontology capturing the semantics of q in terms of the concepts and the relations available in the ontology.
                  Since a perfect characterisation of a source query may not exist, the notions of best sound and complete
                  approximations of an abstraction have been introduced and studied in the typical OBDM context, i.e.,
                  in the case where the ontology is expressed in DL-Lite, and source queries are expressed as unions of
                  conjunctive queries (UCQs). Interestingly, if we restrict our attention to abstractions expressed as UCQs,
                  even best approximations of abstractions are not guaranteed to exist. Thus, a natural question to ask is
                  whether such limitations affect even larger classes of queries. In this paper, we answer this fundamental
                  question for an essential class of queries, namely the class of monotone queries. We define a monotone
                  query language based on disjunctive Datalog enriched with an epistemic operator, and show that its
                  expressive power suffices for expressing the best approximations of monotone abstractions of UCQs.

                   Keywords
                  Abstraction, Disjunctive Datalog, Monotone queries, Epistemic queries




1. Introduction
In Ontology-based Data Management (OBDM) [1], an ontology, i.e., a formal, logic-based repre-
sentation of a domain of interest, is used to provide a high-level conceptual tool for accessing
and managing the data sources of an information system. Suitable mappings declaratively
specify the relationship between the data at the sources and the elements in the ontology, and
this enables the user to carry out many relevant tasks on data through the lens of the ontology
[2, 3].
   Recent papers [4, 5, 6, 7, 8] address a novel issue in OBDM: starting from a query 𝑞𝒮 expressed
over the sources, the goal is to find a so-called abstraction of 𝑞𝒮 [9], i.e., an ontology-based
characterization of 𝑞𝒮 expressed in terms of the ontology elements, and whose answers coincide
with the answer to the original query, modulo the ontology and the mapping. We encountered
the need of abstraction during a joint project with a public statistical research institute. The
institute’s departments must publish subsets of the data they gather in the form of semantically
described linked open data. To compute the content of the datasets the departments execute
suitable queries over the data sources mapped to a shared ontology. Notably, when the dataset is
published, it must be documented through a SPARQL query expressed in terms of the ontology.

SEBD 2022: The 30th Italian Symposium on Advanced Database Systems, June 19-22, 2022, Tirrenia (PI), Italy
" gianluca.cima@u-bordeaux.fr (G. Cima); console@diag.uniroma1.it (M. Console); lenzerini@diag.uniroma1.it
(M. Lenzerini); poggi@diag.uniroma1.it (A. Poggi)
    © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR Workshop Proceedings (CEUR-WS.org)
�This task is currently done manually. The notion of abstraction perfectly captures this scenario
and provides the formal tool for automating the process: given the query over the sources
computing the content of the dataset, the abstraction of such query with respect to the mapping
and the ontology is exactly the SPARQL query to be associated to the open dataset. Besides
the above use case, abstraction can be the appropriate tool in various scenarios. For additional
insights we refer to the references mentioned above.
   The first investigations on abstraction appear in [4, 5]. Both papers point out that the “perfect”
abstraction of a union of conjunctive queries (UCQ) expressed over the data source not always
exists, and present algorithms for computing such abstraction in the case where it both exists,
and can be expressed as a UCQ over the ontology. In [4, 6] the notion of (sound and complete)
approximations of the perfect abstraction is introduced, exactly to cope with situations in which
perfectness cannot be achieved. Moreover, both papers make it clear that, for a given class
of queries 𝐶, one is probably interested in two specific forms of approximations, called 𝐶-
minimally complete and 𝐶-maximally sound abstractions. Based on these notions, [6] presents
a thorough analysis of the verification problem (check whether a given query is a complete
or sound abstraction) and the computation problem of UCQ-minimally complete and UCQ-
maximally sound abstractions of UCQ source queries in OBDM systems based on DL-Lite. In [7]
the computation problem is studied in the context of a specific class of non-monotone queries
for expressing abstractions, and it is shown that this class can provide abstractions that are
better than the one in the UCQ class.
   Thus, with the exception of [7], all the results on abstractions have been obtained under the
assumptions that abstractions are expressed as UCQs over the ontology, and many of them
originate from the observations that best approximations in the UCQ class are not guaranteed
to exist. Thus, a natural issue to investigate is whether such limitations affect even larger
classes of queries for expressing abstractions. The main goal of this paper is to address the
following question: do approximations of perfect abstraction that are best in a fundamental
class of queries, namely the class of monotone queries, always exist? Obviously, a related goal
is to derive algorithms for computing approximations of abstractions that are best in the class
of monotone queries, if they exist. Note that the class of monotone queries includes queries
expressible in First-Order Logic and is therefore extremely important.
   In this paper we answer positively to the above-mentioned fundamental question. More
specifically, the contributions of the paper can be summarized as follows. We present a general
framework for abstraction in OBDM, based on the definition of queries as functions from the
logical models of OBDM systems to sets of tuples. The framework includes a new monotone
query language, called Datalog∨    K , based on disjunctive Datalog enriched with inequalities and
an epistemic operator (Section 2). We consider a scenario where the OBDM specification 𝐽 is
based on DL-LiteRDFS (i.e., the fragment of RDFS expressible in Description Logic), and show that,
in the considered scenario, for any source UCQ 𝑞𝒮 , the best (sound or complete) approximations
of the 𝐽-abstraction of 𝑞𝒮 in the class of monotone queries always exists and can be expressed
in Datalog∨ K (Section 3). As a consequence, if the perfect abstraction exists and is in the class of
monotone queries, then it can be expressed in Datalog∨     K (Section 4).
   This paper is an extended abstract of [10]. Hence, while we assume basic knowledge about
databases [11] and Description Logics (DL) [12], for specific concepts and notations, we refer to
the Preliminaries section of [10].
�2. Framework
In what follows, we implicitly refer to an OBDM specification 𝐽 = ⟨𝒪, 𝒮, ℳ⟩, and when we
denote a query by 𝑞𝒪 (resp., 𝑞𝒮 ) we mean that the query is a query for 𝐽 (resp., a source query),
i.e., is over the signature of the ontology 𝒪 (resp., the schema 𝒮). We follow [6] for the basic
definitions related to abstraction.
                                                         𝐽,𝐷
   We say that 𝑞𝒪 is a perfect 𝐽-abstraction of 𝑞𝒮 if 𝑞𝒪     = 𝑞𝒮𝐷 , for each 𝒮-database 𝐷 consistent
with 𝐽.
   As the condition for an ontology query to be a perfect abstraction of a source query is a strong
one, it might be very well the case that a perfect abstraction of a source query does not exist. It
is then reasonable to consider weaker notions, such as sound or complete approximations, of
perfectness.
                                                                                 𝐽,𝐷         𝐽,𝐷
   We say that 𝑞𝒪 is a complete (resp. sound) 𝐽-abstraction of 𝑞𝒮 if 𝑞𝒮𝐷 ⊆ 𝑞𝒪        (resp. 𝑞𝒪   ⊆ 𝑞𝒮𝐷 ),
for each 𝒮-database 𝐷 consistent with 𝐽.
   Obviously, we might be interested in complete or sound abstractions that approximate 𝑞𝒮 at
best, at least in the context of a specific class of queries. If ℒ𝒪 is a class of queries, we say that a
query 𝑞𝒪 ∈ ℒ𝒪 is an ℒ𝒪 -minimally complete (resp., ℒ𝒪 -maximally sound) 𝐽-abstraction of 𝑞𝒮
if 𝑞𝒪 is a complete (resp., sound) 𝐽-abstraction of 𝑞𝒮 and there is no query 𝑞𝒪      ′ ∈ ℒ such that
                                                                                            𝒪
𝑞𝒪′ is a complete (resp., sound) 𝐽-abstraction of 𝑞 and 𝑞 ′ ⊏ 𝑞 (resp., 𝑞 ⊏ 𝑞 ′ ).
                                                       𝒮        𝒪   𝐽 𝒪            𝒪   𝐽 𝒪
   We now introduce a new language, called Datalog∨          K , for expressing queries over OBDM
specifications. The language is based on disjunctive Datalog, and is used in this paper for
expressing abstractions. The basic component of a Datalog∨           K query is a rule. Assume two
disjoint and countably infinite sets of predicates ℰ and 𝒫, called extensional and intensional,
respectively. A Datalog∨  K rule has one of the following forms:

    • The typical form of disjunctive Datalog, i.e.,

                                          ¯ ) → 𝛼1 (𝑥
                                        𝛾(𝑥         ¯ 1 ) ∨ . . . ∨ 𝛼𝑛 (𝑥
                                                                        ¯𝑛)                                 (1)

      where 𝛾(𝑥 ¯ ) is a conjunction of relational atoms on the predicates of 𝒫 with ⃗𝑥 as variables,
                                    ¯ 𝑖 ) is a single relational atom whose predicate is in 𝒫 such that
      and for 𝑖 = 1, . . . , 𝑛, 𝛼𝑖 (𝑥
      ¯𝑖 ⊆ 𝑥
      𝑥    ¯,
    • A new form specified as follows

                                     ¯ , 𝑧¯) ∧ 𝜉(𝑥
                             K(∃𝑧¯.𝜑(𝑥           ¯ )) → 𝜓1 (𝑥
                                                            ¯ 1 ) ∨ . . . ∨ 𝜓𝑛 (𝑥
                                                                                ¯𝑛)                         (2)

      where 𝜑 is a conjunction of relational atoms over ℰ, 𝜉(𝑥           ¯ ) is a conjunction of inequality
      atoms involving only variables from 𝑥        ¯ and for 𝑗 = 1, . . . , 𝑛, 𝜓𝑗 = ∃𝑦¯𝑗 .𝛾𝑗 (𝑥
                                                                                              ¯ 𝑗 , 𝑦¯𝑗 ), where
      𝛾𝑗 is a conjunction of relational atoms on 𝒫. When 𝑥        ¯ 𝑗 contains only variables 𝑥   ¯ occurring
      in 𝜑(𝑥¯ , 𝑧¯), for 𝑗 = 1, . . . , 𝑛, we say that the rule is safe.
   An 𝑛-ary Datalog∨                                                                    ∨
                    K query 𝑞𝒪 over an OBDM specification 𝐽 is a finite set of DatalogK rules
whose extensional predicates coincide with the alphabet of 𝒪, and whose intensional predicates
include a special 𝑛-ary predicate 𝐴𝑛𝑠. We say that 𝑞𝒪 is safe if all of its rules are safe. The
semantics of 𝑞𝒪 is provided relative to an OBDM system. Given an OBDM system ⟨𝐽, 𝐷⟩, an
interpretation for 𝑞𝒪 is a pair 𝐼 = (𝑚𝑜𝑑(⟨𝐽, 𝐷⟩), 𝑓 ), where 𝑓 is a first-order interpretation
�(with domain Const) for the predicates in 𝒫. As usual, we may also see 𝑓 as the set of facts
{𝑝(𝑐¯) | ¯𝑐 ∈ 𝑝𝑓 }. We now define when 𝐼 satisfies a Datalog∨
                                                            K rule.

    • 𝐼 satisfies a rule of the form (1) if the first-order formula ∀𝑥      ¯ ) → 𝛼1 (𝑥
                                                                       ¯ .𝛾(𝑥         ¯ 1 )∨. . .∨𝛼𝑛 (𝑥
                                                                                                      ¯𝑛)
      is true in 𝑓 ,
    • 𝐼 satisfies a rule of the form (2) if for all tuples ¯𝑐 of constants in Const, the fact that the
      first-order formula ∃𝑧¯.𝜑(𝑐¯, 𝑧¯) ∧ 𝜉(𝑐¯) is satisfied by every model in mod (⟨𝐽, 𝐷⟩) implies
      that 𝑓 satisfies the first-order formula ∃𝑦¯𝑗 .𝛾𝑗 (𝑐¯𝑗 , 𝑦¯𝑗 ), for some 𝑗 = 1, . . . , 𝑛. Observe
      that, if the rule is unsafe, ¯𝑐𝑗 may contain constants of Const that do not occur in ¯𝑐.
  An interpretation 𝐼 for 𝑞𝒪 is called a model of 𝑞𝒪 if all the rules of 𝑞𝒪 are satisfied by 𝐼. Finally,
we define the notion of answers to an 𝑛-ary Datalog∨   K query 𝑞𝒪 w.r.t. an OBDM system ⟨𝐽, 𝐷⟩,
            𝐽,𝐷
denoted by 𝑞𝒪   , as follows: {𝑐¯ ∈ Dom(𝐽, 𝐷, 𝑞𝒪 )𝑛 | ¯𝑐 ∈ 𝐴𝑛𝑠𝑓 for each model (𝑚𝑜𝑑(⟨𝐽, 𝐷⟩), 𝑓 ) of 𝑞𝒪 }.
  Let 𝐽 = ⟨𝒪, 𝒮, ℳ⟩ be an OBDM specification such that 𝒪 = ∅, 𝒮 = {𝑠1 /2, 𝑠2 /1, 𝑠3 /1} and
ℳ = {m1 , m2 , m3 , m4 }, where:
    m1 :     ∀𝑥1 , 𝑥2 .𝑠1 (𝑥1 , 𝑥2 )   →     E(𝑥1 , 𝑥2 )
    m2 :     ∀𝑥.𝑠1 (𝑥, 𝑥)              →     SN(𝑥)
    m3 :     ∀𝑥.𝑠2 (𝑥)                 →     ∃𝑧.E(𝑥, 𝑧)
    m4 :     ∀𝑥.𝑠3 (𝑥)                 →     E(𝑥, 𝑥)
The mapping ℳ of 𝐽 establishes how predicates in the schema 𝒮 relate to the ontology
predicates E (which stands for edge) and SN (which stands for special node).
   Let us now consider the following safe Datalog∨     K query 𝑞𝒪 over 𝐽. 𝑞𝒪 returns all the pairs
(𝑣1 , 𝑣2 ) of special nodes that are known to be distinct and such that there is a path from 𝑣1 to 𝑣2
passing only for nodes known to be special:
    K(E(𝑥1 , 𝑥2 ) ∧ SN(𝑥1 ) ∧ SN(𝑥2 ))           →     𝑇1 (𝑥1 , 𝑥2 )
    K(SN(𝑥1 ) ∧ SN(𝑥2 ) ∧ 𝑥1 ̸= 𝑥2 )             →     𝑇2 (𝑥1 , 𝑥2 )
    𝑇1 (𝑥1 , 𝑦) ∧ 𝑇1 (𝑦, 𝑥2 )                    →     𝑇1 (𝑥1 , 𝑥2 )
    𝑇1 (𝑥1 , 𝑥2 ) ∧ 𝑇2 (𝑥1 , 𝑥2 )                →     Ans(𝑥1 , 𝑥2 )

   The following proposition shows that Datalog∨    K is a monotone query language.
   Every Datalog∨ K query  over 𝐽 is in M 𝐽 , for every OBDM specification 𝐽.

   The semantics should make it clear that K is the knowledge operator in the S5 epistemic
logic: the formula K𝐴 should be read as “𝐴 is known (i.e., logically implied) by the system”
[13]. Therefore, when accessing the information modeled by ⟨𝐽, 𝐷⟩, a Datalog∨     K query extracts
what is known by the system, and this characteristic is crucial for not falling into undecidability
resulting from using Datalog rules jointly with Description Logics (see [14, 15]), as stated in the
following proposition. Let Σ be an OBDM system. Answering safe Datalog∨        K queries w.r.t. Σ is
decidable if and only if answering CQs w.r.t. Σ is decidable .
                                                             1

   Although our framework is general enough to consider any DL for expressing ontologies
and any query language for expressing source queries, in the rest of this paper we will carry
out our investigation in the following setting: (𝑖) ontologies are expressed in DL-LiteRDFS , and
(𝑖𝑖) source queries are expressed as UCQs.
    1
    With answering we implicitly refer to the associated recognition problem, i.e., check whether a tuple is in the
answer to a query.
�   At this point, one may wonder whether Datalog∨  K is the right language to express monotone
abstractions in this setting. While a thorough analysis of the language is outside the scope of
the present paper, the following proposition provides a positive answer to this question, at least
from the computational point of view.
   In our setting, (𝑖) answering safe Datalog∨K queries is in coNP in data complexity, and (𝑖𝑖)
there exists an OBDM specification 𝐽 and a CQ 𝑞𝒮 such that, given an 𝒮-database 𝐷, answering
the M-maximally sound 𝐽-abstraction of 𝑞𝒮 is coNP-hard in data complexity.
   To ease the presentation, from now on we assume that mappings and source queries do not
mention constants. However, all our results can be straightforwardly adapted to the case where
constants are allowed.


3. Monotone approximations of abstractions
In this section we investigate the problem of the existence of M-minimally complete and M-
maximally sound abstractions, in the restricted setting specified at the end of the previous
section. In particular, we start by showing that M-minimally complete abstractions of UCQ
source queries always exist and can be expressed in Datalog∨            K . Then, because of the lack of
space, we will just mention the corresponding result on M-maximally sound abstractions.
   Given a CQ 𝑞𝒮 = {𝑥                 ¯ , 𝑦¯)}, the subroutine SaturateQ(𝑞𝒮 ) computes a UCQ̸= in the
                            ¯ | ∃𝑦¯.𝜑(𝑥
following way: for each possible unifier 𝜇 on the variables in 𝑥       ¯ ∪ 𝑦¯ such that 𝜇(𝑥) ∈ 𝑥    ¯ for each
𝑥∈𝑥    ¯ , SaturateQ(𝑞𝒮 ) contains a query obtained from 𝜇(𝑞𝒮 ) by adding the inequality atom
𝑡1 ̸= 𝑡2 for each pair of distinct variables 𝑡1 , 𝑡2 occurring in 𝜇(𝑞𝒮 ). For a UCQ 𝑞𝒮 , we denote
by SaturateQ(𝑞𝒮 ) the UCQ̸= obtained by applying SaturateQ(𝑞) to each disjunct 𝑞 of 𝑞𝒮 . We
write each CQ̸= 𝑞 generated by SaturateQ(𝑞𝒮 ) as 𝑞 = {𝑥         ¯ | ∃𝑦¯.𝜑(𝑥¯ , 𝑦¯) ∧ 𝜉(𝑥
                                                                                       ¯ , 𝑦¯)}, where 𝜑(𝑥  ¯ , 𝑦¯)
and 𝜉(𝑥   ¯ , 𝑦¯) are the conjunctions of the relational atoms over 𝒮 and of the inequality atoms,
respectively, occurring in the body of 𝑞.
   Moreover, for an OBDM specification 𝐽 = ⟨𝒪, 𝒮, ℳ⟩ and a CQ̸= 𝑞 = {𝑥                       ¯ | ∃𝑦¯.𝜑(𝑥
                                                                                                       ¯ , 𝑦¯) ∧
                                                                 ∨
  ¯ , 𝑦¯)} over 𝒮, we denote by 𝑟𝑞 the following DatalogK rule of form (2):
𝜉(𝑥

                                                    ¯ , 𝒴¯ )) → Ans(𝑥
                                𝑟𝑞 = K(∃𝑧¯.ℳ(𝑞) ∧ 𝜉(𝑥               ¯ ),

where (i) ℳ(𝑞) is computed by simply ignoring the inequality atoms and chasing the set of
relational atoms occurring in the body of 𝑞; (ii) 𝒴¯ ⊆ 𝑦¯ is the subset of the existential variables of
𝑞 occurring in ℳ(𝑞); (iii) 𝑧¯ are the fresh variables introduced when computing ℳ(𝑞); and (iv)
𝜉(𝑥¯ , 𝒴¯ ) is the conjunction of the inequality atoms obtained from 𝜉(𝑥 ¯ , 𝑦¯) by removing all those
atoms of the form 𝑦 ̸= 𝑡 and 𝑡 ̸= 𝑦 in which 𝑦 is an existential variable occurring in 𝑦¯ but not in
¯ (i.e., not in ℳ(𝑞)) and 𝑡 is any other possible variable. Observe that the epistemic operator
𝒴
is exploited to bind the existential variables coming from 𝑞. This is achieved by pushing the
subset 𝒴¯ of the existential variables 𝑦¯ of 𝑞 occurring in ℳ(𝑞) inside the K operator.
   We are now ready to present the algorithm M-MinComplete for computing the M-minimally
complete 𝐽-abstractions. Given an OBDM specification 𝐽 = ⟨𝒪, 𝒮, ℳ⟩ and a UCQ 𝑞𝒮 over 𝒮
such that SaturateQ(𝑞𝒮 ) = 𝑞1 ∪ . . . ∪ 𝑞𝑛 , M-MinComplete(𝐽, 𝑞𝒮 ) outputs the Datalog∨       K query
𝑞𝒪 = {𝑟𝑞1 , . . . , 𝑟𝑞𝑛 } over 𝐽.
�    Consider the OBDM specification 𝐽 illustrated in Example 2 and the CQ 𝑞𝒮 = {(𝑥) |
∃𝑦.𝑠1 (𝑥, 𝑦)} over 𝒮. One can verify that M-MinComplete(𝐽, 𝑞𝒮 ) returns the following safe
Datalog∨ K query 𝑞𝒪 over 𝐽 asking for all those nodes 𝑣 such that either 𝑣 is connected to a node
𝑣 ′ known to be different from 𝑣 or 𝑣 is a special node with a self-loop:
    K(E(𝑥, 𝑦) ∧ 𝑥 ̸= 𝑦)        →   Ans(𝑥)
    K(E(𝑥, 𝑥) ∧ SN(𝑥))         →   Ans(𝑥)
Note that 𝑞𝒪 is a better complete approximation than the query {(𝑥) | ∃𝑦.E(𝑥, 𝑦)}, which is
the UCQ-minimally complete 𝐽-abstraction of 𝑞𝒮 [6].
   M-MinComplete(𝐽, 𝑞𝒮 ) terminates and returns the unique (up to 𝐽-equivalence) M-minimally
complete 𝐽-abstraction of 𝑞𝒮 . Furthermore, we observe that the result of M-MinComplete(𝐽, 𝑞𝒮 )
is independent from the assertions occurring in the ontology of the OBDM specification 𝐽.
Similar results can be obtained for OBDM specifications based on more expressive Horn DL
ontologies.
   Before concluding this section, we observe that M-MinComplete(𝐽, 𝑞𝒮 ) may return unsafe
Datalog∨  K queries. Nevertheless, these queries enjoy nice computational properties in our
setting.
   Let 𝑞𝒪 be a Datalog∨  K query with only rules of the form K(∃𝑧        ¯ , 𝑧¯) ∧ 𝜉(𝑥
                                                                     ¯.𝜑(𝑥           ¯ )) → Ans(𝑥
                                                                                                ¯ 𝑎 ).
In our setting, (i) answering 𝑞𝒪 is in PTime in data complexity, and (ii) if 𝑞𝒪 is safe, then it is
                                                      𝐽,𝐷
possible to compute a UCQ̸= 𝑞𝑟 over 𝒮 such that 𝑞𝒪        = 𝑞𝑟𝐷 , for each 𝒮-database 𝐷.
   As for best sound approximations of abstractions, it can be shown (cf. details in [10]) that,
similarly to the case of best complete abstractions, an algorithm exists that, given a UCQ 𝑞𝒮 ,
terminates and computes the unique (up to 𝐽-equivalence) M-maximally sound 𝐽-abstraction
of 𝑞𝒮 , expressed as a Datalog∨  K query. Thus, we have the following:
   Given a UCQ 𝑞𝒮 , both the unique (up to 𝐽-equivalence) M-minimally complete and the
M-maximally sound 𝐽-abstraction of 𝑞𝒮 always exist and can be expressed as Datalog∨        K queries.
Consider the OBDM specification 𝐽 illustrated in Example 2. One can verify that the following set
ℛ of safe Datalog∨  K rules is the unique (up to 𝐽-equivalence) M-maximally sound 𝐽-abstraction
of 𝑞𝒮 :
  K(E(𝑥1 , 𝑥2 ) ∧ 𝑥1 ̸= 𝑥2 )   →     𝑠1 (𝑥1 , 𝑥2 )
  K(E(𝑥, 𝑥))                   →     𝑠3 (𝑥) ∨ 𝑠1 (𝑥, 𝑥)
  K(∃𝑧.E(𝑥, 𝑧))                →     𝑠2 (𝑥) ∨ ∃𝑦.𝑠1 (𝑥, 𝑦) ∨ 𝑠3 (𝑥)
  K(SN(𝑥))                     →     𝑠1 (𝑥, 𝑥)


4. Perfect Abstractions
It follows from Theorem 3 that either the perfect abstraction of a source UCQ can be expressed
in Datalog∨ K , or it cannot be expressed as a monotone query (if it exists at all). We now present
an algorithm that, given an OBDM specification 𝐽 and a source UCQ 𝑞𝒮 , returns the perfect
𝐽-abstraction of 𝑞𝒮 , if and only if it exists and is in M. To this aim, we make use of Proposition 3,
and refer to the 𝑞𝑟 defined in that proposition as the rewriting of 𝑞𝒪 w.r.t. 𝐽.
   Our algorithm, that we call M-Perfect, goes as follows. Given an OBDM specification
𝐽 = ⟨𝒪, 𝒮, ℳ⟩ and a UCQ 𝑞𝒮 over 𝒮 as input: if (i) 𝑞𝒪 = M-MinComplete(𝐽, 𝑞𝒮 ) is safe and
�(ii) 𝑞𝑟 ⊑𝒮 𝑞𝒮 ; then return 𝑞𝒪 ; otherwise, report “no perfect 𝐽-abstraction of 𝑞𝒮 is in M”.
    In Example 3, M-Perfect(𝐽, 𝑞𝒮 ) returns 𝑞𝒪 , which is the perfect 𝐽-abstraction of 𝑞𝒮 .
    We conclude this section by establishing termination and correctness of the M-Perfect
algorithm.
    M-Perfect(𝐽, 𝑞𝒮 ) terminates and returns the unique (up to 𝐽-equivalence) perfect 𝐽-abstraction
of 𝑞𝒮 if and only if such an abstraction can be expressed in M.


5. Conclusion
We presented a thorough study of monotone abstractions of UCQs in OBDM systems. We
proved that best approximations of such abstractions always exist and introduced a query
language, Datalog∨ K , that captures them. Directions for future work are many. In the context
of monotone abstractions, we would like to investigate the case of more expressive ontology
languages, e.g., DL-Liteℛ , as well as more expressive source query languages, e.g., unions of
conjunctive queries with inequalities and disjunctive Datalog. Finally, the problem of checking
whether given best approximations are expressible in simpler and more user friendly languages
remains open.


Acknowledgements
This work has been partially supported by the ANR AI Chair INTENDED (ANR-19-CHIA-0014),
by MIUR under the PRIN 2017 project “HOPE” (prot. 2017MMJJRE), and by the EU under the
H2020-EU.2.1.1 project TAILOR, grant id. 952215.

References
 [1] A. Poggi, D. Lembo, D. Calvanese, G. De Giacomo, M. Lenzerini, R. Rosati, Link-
     ing data to ontologies, Journal on Data Semantics X (2008) 133–173. doi:10.1007/
     978-3-540-77688-8\_5.
 [2] M. Lenzerini, Managing data through the lens of an ontology, AI Magazine 39 (2018)
     65–74.
 [3] G. De Giacomo, D. Lembo, M. Lenzerini, A. Poggi, R. Rosati, Using ontologies for semantic
     data integration, in: A Comprehensive Guide Through the Italian Database Research Over
     the Last 25 Years, 2018, pp. 187–202.
 [4] G. Cima, Preliminary results on ontology-based open data publishing, in: Proceedings of
     the Thirtieth International Workshop on Description Logics (DL 2017), volume 1879 of
     CEUR Electronic Workshop Proceedings, 2017.
 [5] C. Lutz, J. Marti, L. Sabellek, Query expressibility and verification in ontology-based data
     access, in: Proceedings of the Sixteenth International Conference on the Principles of
     Knowledge Representation and Reasoning (KR 2018), 2018, pp. 389–398.
 [6] G. Cima, M. Lenzerini, A. Poggi, Semantic characterization of data services through
     ontologies, in: Proceedings of the Twenty-Eighth International Joint Conference on
     Artificial Intelligence (IJCAI 2019), 2019, pp. 1647–1653.
� [7] G. Cima, M. Lenzerini, A. Poggi, Non-monotonic ontology-based abstractions of data
     services, in: Proceedings of the Seventeenth International Conference on Principles of
     Knowledge Representation and Reasoning (KR 2020), 2020, pp. 243–252.
 [8] G. Cima, M. Console, M. Lenzerini, A. Poggi, Abstraction in Data Integration, in: Proceed-
     ings of the Thirty-Sixth Annual ACM/IEEE Symposium on Logic in Computer Science
     (LICS 2021), 2021, pp. 1–11.
 [9] G. Cima, Abstraction in Ontology-based Data Management, Ph.D. thesis, Sapienza Univer-
     sity of Rome, 2020.
[10] G. Cima, M. Console, M. Lenzerini, A. Poggi, Monotone abstractions in ontology-based
     data management, in: Proceedings of the Thirty-Sixth AAAI Conference on Artificial
     Intelligence (AAAI 2022), 2022.
[11] S. Abiteboul, R. Hull, V. Vianu, Foundations of Databases, Addison Wesley Publ. Co., 1995.
[12] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, P. F. Patel-Schneider (Eds.), The Descrip-
     tion Logic Handbook: Theory, Implementation and Applications, Cambridge University
     Press, 2003.
[13] H. J. Levesque, G. Lakemeyer, The Logic of Knowledge Bases, The MIT Press, 2001.
[14] A. Y. Levy, M.-C. Rousset, Combining Horn rules and description logics in CARIN, Artificial
     Intelligence 104 (1998) 165–209.
[15] D. Calvanese, R. Rosati, Answering recursive queries under keys and foreign keys is unde-
     cidable, in: Proceedings of the Tenth International Workshop on Knowledge Representation
     meets Databases (KRDB 2003), volume 79 of CEUR Electronic Workshop Proceedings, 2003.
�