Vol-3194/paper19
Jump to navigation
Jump to search
Paper
Paper | |
---|---|
edit | |
description | |
id | Vol-3194/paper19 |
wikidataid | Q117344918→Q117344918 |
title | Counting Database Repairs Entailing a Query: The Case of Functional Dependencies |
pdfUrl | https://ceur-ws.org/Vol-3194/paper19.pdf |
dblpUrl | https://dblp.org/rec/conf/sebd/CalauttiLPS22 |
volume | Vol-3194→Vol-3194 |
session | → |
Counting Database Repairs Entailing a Query: The Case of Functional Dependencies
Counting Database Repairs Entailing a Query: The Case of Functional Dependencies (Discussion Paper) Marco Calautti1 , Ester Livshits2 , Andreas Pieris2,3 and Markus Schneider2 1 DISI, University of Trento, Italy 2 University of Edinburgh, UK 3 University of Cyprus, Cyprus Abstract A key task in the context of consistent query answering is to count the number of repairs that entail the query, with the ultimate goal being a precise data complexity classification. This has been achieved in the case of primary keys and self-join-free conjunctive queries (CQs) via an FP/♯P-complete dichotomy. We lift this result to the more general case of functional dependencies (FDs). Another important task in this context is whenever the counting problem in question is intractable, to classify it as approximable, i.e., the target value can be efficiently approximated with error guarantees via a fully polynomial-time randomized approximation scheme (FPRAS), or as inapproximable. Although for primary keys and CQs (even with self-joins) the problem is always approximable, we prove that this is not the case for FDs. We show, however, that the class of FDs with a left-hand side chain forms an island of approximability. We see these results, apart from being interesting in their own right, as crucial steps towards a complete classification of approximate counting of repairs in the case of FDs and self-join-free CQs. 1. Introduction A database is inconsistent if it does not satisfy its integrity constraints. There is a consensus that inconsistency is a real-life phenomenon that arises due to many reasons such as integration of conflicting sources. With the aim of obtaining conceptually meaningful answers to queries posed over inconsistent databases, Arenas, Bertossi, and Chomicki introduced in the late 1990s the notion of Consistent Query Answering (CQA) [1]. The key elements underlying CQA are (i) the notion of (database) repair of an inconsistent database 𝐷, that is, a consistent database whose difference with 𝐷 is somehow minimal, and (ii) the notion of query answering based on certain answers, that is, answers that are entailed by every repair. Example 1. Consider the relation name Employee(id, name, dept) that comes with the constraint that the attribute id functionally determines name and dept. Consider also the database 𝐷 con- sisting of the tuples: (1, Bob, HR), (1, Bob, IT), (2, Alice, IT), (2, Tim, IT). It is easy to see that 𝐷 is inconsistent since we are uncertain about Bob’s department, and the name of the employee with id 2. To devise a repair, we need to keep one tuple from each conflicting pair, which leads to a SEBD 2022: The 30th Italian Symposium on Advanced Database Systems, June 19-22, 2022, Tirrenia (PI), Italy " marco.calautti@unitn (M. Calautti); ster.livshits@ed.ac.uk (E. Livshits); apieris@inf.ed.ac.uk (A. Pieris); m.schneider@ed.ac.uk (M. Schneider) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) �maximal subset of 𝐷 that is consistent. Observe now that the query that asks whether employees 1 and 2 work in the same department is true only in two out of four repairs, and thus, not entailed. Counting Repairs Entailing a Query. A key task in this context is to count the number of repairs of an inconsistent database 𝐷 w.r.t. a set Σ of constraints that entail a given query 𝑄; for clarity, we base our discussion on Boolean queries. Given a set Σ of constraints and a query 𝑄, the problem ♯Repairs(Σ, 𝑄) that takes as input a database 𝐷, and asks for the number of repairs of 𝐷 w.r.t. Σ that entail 𝑄, can be tractable or intractable depending on the shape of Σ and 𝑄. This leads to the natural question whether we can establish a complete classification, i.e., for every Σ and 𝑄, classify ♯Repairs(Σ, 𝑄) as tractable or intractable by simply inspecting Σ and 𝑄. We already know from [2] that for a set Σ of primary keys, and a self-join-free conjunctive queries (SJFCQs) query 𝑄, ♯Repairs(Σ, 𝑄) is either in FP or ♯P-complete, and we can determine in polynomial time, by simply analyzing Σ and 𝑄, which complexity statement holds. However, the dichotomy result in [2] does not apply when we consider the more general class of functional dependencies (FDs). This brings us to the following question: Question 1: Can we lift the dichotomy result for primary keys and SJFCQs to the more general case of functional dependencies? The closest known result related to Question 1 is for the problem ♯Repairs(Σ), where Σ is a set of FDs, that given a database 𝐷, asks for the number of repairs of 𝐷 w.r.t. Σ (without considering a query). In particular, we know from [3] that whenever Σ has a so-called left-hand side (LHS, for short) chain (up to equivalence), ♯Repairs(Σ) is in FP; otherwise, it is ♯P-complete. Approximate Counting. Another key task is to classify ♯Repairs(Σ, 𝑄), whenever it is intractable, as approximable via a so-called fully polynomial-time randomized approximation scheme (FPRAS), or as inapproximable. For a set Σ of primary keys, and a CQ 𝑄 (even with self-joins), ♯Repairs(Σ, 𝑄) is always approximable [4]. The next question is whether we can obtain a full classification for FDs: Question 2: For a set Σ of FDs, and an SJFCQ 𝑄, can we determine whether ♯Repairs(Σ, 𝑄) admits an FPRAS by inspecting Σ and 𝑄? Summary of Contributions. Concerning Question (1), we lift the dichotomy of [2] for primary keys and SJFCQs to the general case of FDs (Theorem 4). Concerning Question (2), although we do not establish a complete classification (which would resolve a challenging open problem), we show that, for every set Σ of FDs with an LHS chain (up to equivalence) and a CQ 𝑄 (even with self-joins), ♯Repairs(Σ, 𝑄) admits an FPRAS. On the other hand, we show that there is a very simple set Σ of FDs such that, for every SJFCQ 𝑄, ♯Repairs(Σ, 𝑄) does not admit an FPRAS (under a standard complexity assumption). 2. Preliminaries We consider the disjoint countably infinite sets C and V of constants and variables, respectively. For 𝑛 > 0, let [𝑛] be the set {1, . . . , 𝑛}, and for a finite set 𝑆, let ♯𝑆 be the cardinality of 𝑆. Relational Databases. A schema S is a finite set of relation names with associated arity; we write 𝑅/𝑛 to denote that 𝑅 has arity 𝑛 > 0. Each relation name 𝑅/𝑛 is associated with a tuple �of distinct attribute names (𝐴1 , . . . , 𝐴𝑛 ); we write att(𝑅) for the set {𝐴1 , . . . , 𝐴𝑛 }. A position of S is a pair (𝑅, 𝐴), where 𝑅 ∈ S and 𝐴 ∈ att(𝑅), that essentially identifies the attribute 𝐴 of 𝑅. A atom 𝛼 over S is an expression of the form 𝑅(𝑡1 , . . . , 𝑡𝑛 ), where 𝑅/𝑛 ∈ S, and 𝑡𝑖 ∈ C ∪ V for each 𝑖 ∈ [𝑛]. A fact is an atom mentioning only constants; we may say 𝑅-atom/fact to indicate that the relation name is 𝑅. For an atom 𝛼 = 𝑅(𝑡1 , . . . , 𝑡𝑛 ), with (𝐴1 , . . . , 𝐴𝑛 ) being the tuple of attribute names of 𝑅, we write 𝛼[𝐴𝑖 ] for the term 𝑡𝑖 . A database 𝐷 over S is a finite set of facts over S. We write 𝐷𝑅 , for 𝑅 ∈ S, for the database {𝑅(𝑡¯) | 𝑅(𝑡¯) ∈ 𝐷}. The active domain of 𝐷, denoted dom(𝐷), is the set of constants occurring in 𝐷. Functional Dependencies. A functional dependency (FD) 𝜑 over a schema S is an expression of the form 𝑅 : 𝑋 → 𝑌 , where 𝑅/𝑛 ∈ S and 𝑋, 𝑌 ⊆ att(𝑅); we say 𝑅-FD to indicate that the relation name is 𝑅. We call 𝜑 a key if 𝑋 ∪ 𝑌 = att(𝑅). Given a set Σ of FDs over S, we write Σ𝑅 , for 𝑅 ∈ S, for the set {𝜑 ∈ Σ | 𝜑 is an 𝑅-FD}. We call Σ a set of primary keys if it consists only of keys, and |Σ𝑅 | ≤ 1 for each 𝑅 ∈ S. A database 𝐷 over S satisfies an FD 𝜑 over S, denoted 𝐷 |= 𝜑, if, for every two 𝑅-facts 𝑓, 𝑔 ∈ 𝐷 the following holds: 𝑓 [𝐴] = 𝑔[𝐴] for every 𝐴 ∈ 𝑋 implies 𝑓 [𝐵] = 𝑔[𝐵] for every 𝐵 ∈ 𝑌 . We say that 𝐷 is consistent w.r.t. a set Σ of FDs, written 𝐷 |= Σ, if 𝐷 |= 𝜑 for every 𝜑 ∈ Σ; otherwise, 𝐷 is inconsistent w.r.t. Σ. Two sets of FDs Σ, Σ′ are equivalent if, for every database 𝐷, 𝐷 |= Σ iff 𝐷 |= Σ′ . Conjunctive Queries. A (Boolean) conjunctive query (CQ) 𝑄 over S is an expression of the form ∃𝑦¯1 , . . . , ∃𝑦¯𝑛 𝑅1 (𝑦¯1 ) ∧ · · · ∧ 𝑅𝑛 (𝑦¯𝑛 ), where each 𝑅𝑖 (𝑦¯𝑖 ), for 𝑖 ∈ [𝑛], is an atom over S. With abuse of notation we may treat 𝑄 as the set of its atoms. The query 𝑄 is a self-join-free CQ (SJFCQ) if it mentions every relation name of S at most once. Let var(𝑄) and const(𝑄) be the set of variables and constants in 𝑄, respectively. A homomorphism from 𝑄 to a database 𝐷 is a function ℎ : var(𝑄) ∪ const(𝑄) → dom(𝐷), which is the identity over C, such that 𝑅𝑖 (ℎ(𝑦¯𝑖 )) ∈ 𝐷 for 𝑖 ∈ [𝑛]. We say that 𝐷 entails 𝑄, and write 𝐷 |= 𝑄 if there exists a homomorphism from 𝑄 to 𝐷. We point out that in this paper we are only dealing with Boolean queries, but all the presented results can be generalized to non-Boolean CQs. Database Repairs. Given a database 𝐷 and a set Σ of FDs, both over a schema S, a repair of 𝐷 w.r.t. Σ is a maximal subset 𝐷′ of 𝐷 such that 𝐷′ |= Σ. Let repΣ (𝐷) be the set of repairs of 𝐷 w.r.t Σ. Given a CQ 𝑄 over S, we write repΣ,𝑄 (𝐷) for {𝐷′ ∈ repΣ (𝐷) | 𝐷′ |= 𝑄}. The following is a useful observation that we will exploit later in the paper: Lemma 2. Consider a database 𝐷, a set Σ of FDs, and a Boolean SJFCQ 𝑄. We can compute in polynomial time a database 𝐷′ such that ♯repΣ (𝐷) = ♯repΣ,𝑄 (𝐷′ ). Problem Definition. For each set Σ of FDs and CQ 𝑄, we focus on the problem ♯Repairs(Σ, 𝑄), which takes as input a database 𝐷, and asks for the number ♯repΣ,𝑄 (𝐷). The goal is to classify it as tractable (place it in FP) or as intractable (show that is ♯P-complete). Another important task is whenever ♯Repairs(Σ, 𝑄) is intractable to classify it as approx- imable via a so-called fully polynomial-time randomized approximation scheme (FPRAS, for short), or as inapproximable. Formally, an FPRAS for ♯Repairs(Σ, 𝑄) is a randomized algorithm A that takes as input a database 𝐷, and numbers 𝜖 > 0 and 0 < 𝛿 < 1, runs in polyno- mial(︀ time in ||𝐷||, 1/𝜖 and log(1/𝛿), and produces )︀ a random variable A(𝐷, 𝜖, 𝛿) such that Pr |A(𝐷, 𝜖, 𝛿) − ♯repΣ,𝑄 (𝐷)| ≤ 𝜖 · ♯repΣ,𝑄 (𝐷) ≥ 1 − 𝛿. �LHS Chain FDs. Consider a set Σ of FDs over S, and a relation name 𝑅 ∈ S. We say that Σ𝑅 has a left-hand side chain (LHS chain) if the FDs of Σ𝑅 can be arranged in a sequence 𝑅 : 𝑋1 → 𝑌1 , . . . , 𝑅 : 𝑋𝑛 → 𝑌𝑛 such that 𝑋1 ⊆ 𝑋2 ⊆ · · · ⊆ 𝑋𝑛 [3]. We call such a sequence an LHS chain of Σ𝑅 , and say that Σ has an LHS chain if, for every 𝑅 ∈ S, Σ𝑅 has an LHS chain. We know that for sets of FDs with an LHS chain, counting the number of repairs is tractable: Proposition 3 ([3]). Given a database 𝐷, and a set Σ of FDs with an LHS chain (up to equiva- lence), ♯repΣ (𝐷) is computable in polynomial time in ||𝐷||. 3. Exact Counting The goal of this section is to discuss the key ideas behind our main result on exact counting: Theorem 4. For a set Σ of FDs, and an SJFCQ 𝑄, ♯Repairs(Σ, 𝑄) is either in FP or ♯P-complete, and we can decide in polynomial time in ||Σ|| + ||𝑄|| which of the two cases hold. We observe that for every set Σ of FDs, and an SJFCQ 𝑄, ♯Repairs(Σ, 𝑄) is in ♯P: simply guess (in polynomial time) a subset 𝐷′ of the given database 𝐷, and verify (again in polynomial time) that 𝐷′ is a repair of 𝐷 w.r.t. Σ that entails 𝑄. We also note that the non-existence of an LHS chain (up to equivalence) is a preliminary boundary for the hard side of the target dichotomy, regardless of the CQ. We know from [3] that, for a set Σ of FDs, the problem ♯Repairs(Σ), which given a database 𝐷, asks for ♯repΣ (𝐷), is ♯P-hard whenever Σ does not have an LHS chain (up to equivalence). From the above discussion, and Lemma 2, we get the following result: Proposition 5. Consider a set Σ of FDs without an LHS chain (up to equivalence), and an SJFCQ 𝑄. ♯Repairs(Σ, 𝑄) is ♯P-complete. From Proposition 5, and the fact that checking whether a set Σ of FDs has an LHS chain (up to equivalence) is feasible in polynomial time [3], to obtain Theorem 4, it suffices to provide an analogous result for FDs with an LHS chain (up to equivalence). To this end, following what was done for primary keys in [2], we are going to concentrate on the problem of computing the relative frequency of the query. The relative frequency of a CQ 𝑄 w.r.t. a database 𝐷 and a ♯rep (𝐷) set Σ of FDs is defined as the ratio rfreq𝐷,Σ (𝑄) = ♯repΣ,𝑄(𝐷) that computes the percentage of Σ repairs that entail the query 𝑄. We then define the problem RelFreq(Σ, 𝑄) that takes as input a database 𝐷, and asks for rfreq𝐷,Σ (𝑄). We can indeed focus on RelFreq(Σ, 𝑄) since we know from Proposition 3 that ♯Repairs(Σ) is in FP whenever Σ has an LHS chain (up to equivalence), which implies that ♯Repairs(Σ, 𝑄) and RelFreq(Σ, 𝑄) have the same complexity, that is, both are either in FP or ♯P-hard. Consequently, to obtain Theorem 4 it suffices to show the following: Theorem 6. For a set Σ of FDs with an LHS chain (up to equivalence), and an SJFCQ 𝑄, RelFreq(Σ, 𝑄) is either in FP or ♯P-hard, and we can decide in polynomial time in ||Σ|| + ||𝑄|| which of the two cases hold. In the rest, we give some hints on the proof of Theorem 6; we first need some new notions. Canonical Covers. If Σ has an LHS chain (up to equivalence), we know that it has a single canonical cover1 Σ′ , and Σ′ has an LHS chain [3]. Furthermore, it can be verified that, for every 1 That is, (i) the FDs in Σ are not redundant and have no redundant attributes, and (ii) for a relation name 𝑅 and 𝑋 ⊆ att(𝑅), there is at most one FD in Σ of the form 𝑅 : 𝑋 → 𝑌 . �relation name 𝑅 occurring in Σ′ , there exists a unique sequence 𝑅 : 𝑋1 → 𝑌1 , . . . , 𝑅 : 𝑋𝑛 → 𝑌𝑛 of the FDs of Σ′𝑅 , which we call the LHS chain of Σ′𝑅 , such that (i) 𝑋𝑖 ⊊ 𝑋𝑖+1 for each 𝑖 ∈ [𝑛 − 1], (ii) 𝑋𝑖 ∩ 𝑌𝑗 = ∅ for each 𝑖, 𝑗 ∈ [𝑛], and (iii) 𝑌𝑖 ∩ 𝑌𝑗 = ∅ for each 𝑖, 𝑗 ∈ [𝑛] with 𝑖 ̸= 𝑗. Since a canonical cover of a set of FDs can be computed in polynomial time, to establish Theorem 6 it suffices to show it for sets of FDs with an LHS chain that are canonical. Therefore, in the rest of the section, we assume that sets of FDs are canonical. Primary FDs and Positions. Consider now an SJFCQ 𝑄. Let 𝛼𝑅 be the 𝑅-atom in 𝑄, and let Λ𝑅 = 𝑅 : 𝑋1 → 𝑌1 , . . . , 𝑅 : 𝑋𝑛 → 𝑌𝑛 be the LHS chain of Σ𝑅 . We call an FD 𝑅 : 𝑋𝑖 → 𝑌𝑖 , for some 𝑖 ∈ [𝑛], the primary FD of Σ𝑅 w.r.t. 𝑄 if 𝑋𝑖 ∪𝑌𝑖 contains an attribute 𝐴 such that at position (𝑅, 𝐴) in 𝛼𝑅 we have a variable, whereas at each position of {(𝑅, 𝐵) | 𝐵 ∈ 𝑋𝑗 ∪ 𝑌𝑗 for 𝑗 < 𝑖} in 𝛼𝑅 we have a constant. Note that there is no guarantee that the primary FD of Σ𝑅 w.r.t. 𝑄 exists. If the primary FD 𝑅 : 𝑋𝑖 → 𝑌𝑖 of Σ𝑅 w.r.t. 𝑄 exists, for some 𝑖 ∈ [𝑛], then the primary- lhs positions of 𝛼𝑅 (w.r.t. Σ) are the positions {(𝑅, 𝐴) | 𝐴 ∈ 𝑋𝑖 }, while the non-primary-lhs positions of 𝛼𝑅 (w.r.t. Σ) are the positions {(𝑅, 𝐵) | 𝐵 ∈ att(𝑅) ∖ 𝑋𝑖 }. We further call the sequence of FDs 𝑅 : 𝑋1 → 𝑌1 , . . . , 𝑅 : 𝑋𝑖−1 → 𝑌𝑖−1 the primary prefix of Σ𝑅 w.r.t. 𝑄. If the primary FD of Σ𝑅 w.r.t. 𝑄 does not exist, then, by convention, the primary-lhs positions of 𝛼𝑅 are the positions {(𝑅, 𝐴) | 𝐴 ∈ att(𝑅)}, i.e., all the positions in 𝛼𝑅 are primary-lhs, which in turn implies that 𝛼𝑅 has no non-primary-lhs positions. Moreover, the primary prefix of Σ𝑅 w.r.t. 𝑄 is defined as the sequence Λ𝑅 itself. We denote by pvarΣ (𝛼𝑅 ) the set of variables occurring in 𝛼𝑅 at primary-lhs positions of 𝛼𝑅 . Query Complex Part. A variable 𝑥 ∈ var(𝑄) is a liaison variable (of 𝑄) if it occurs more than once in 𝑄. The complex part of 𝑄 w.r.t. Σ, denoted compΣ (𝑄), is the set of atoms 𝛼 of 𝑄 in which we have a constant or a liaison variable at a non-primary-lhs position (𝑅, 𝐴), with 𝛼 being the 𝑅-atom of 𝑄, such that, for every FD 𝑅 : 𝑋 → 𝑌 in the primary prefix of Σ𝑅 w.r.t. 𝑄, 𝐴 ̸∈ 𝑋 ∪ 𝑌 . We observe that when compΣ (𝑄) = ∅, the number of repairs of 𝐷 w.r.t. Σ that Σ,𝑄 entail 𝑄 coincides with the number of repairs (without any query) of the subset 𝐷conf of 𝐷 obtained after removing the facts of 𝐷 that are somehow in a conflict with 𝑄, and thus, they Σ,𝑄 cannot appear in a repair that entails 𝑄. Since 𝐷conf can be computed in polynomial time in ||𝐷||, by Proposition 3 we obtain the following: Proposition 7. Given a database 𝐷, a set Σ of FDs with an LHS chain, and an SJFCQ 𝑄 with compΣ (𝑄) = ∅, it holds that rfreq𝐷,Σ (𝑄) is computable in polynomial time in ||𝐷||. 3.1. The Tractable Side We start by defining some properties that, when satisfied by a SJFCQ, allow to restate its relative frequency in terms of the relative frequency of simpler queries. We write 𝑄1 ⊎ 𝑄2 for the CQ consisting of the atoms of two non-empty CQs 𝑄1 , 𝑄2 that share no atoms and variables, i.e., 𝑄1 ∩ 𝑄2 = ∅ and var(𝑄1 ) ∩ var(𝑄2 ) = ∅. We also write 𝑄𝑥↦→𝑐 , where 𝑥 ∈ var(𝑄) and 𝑐 ∈ C, for the CQ obtained from 𝑄 after replacing 𝑥 with 𝑐. Theorem 8. Consider a set Σ of FDs with an LHS chain, and a SJFCQ 𝑄. For every database 𝐷, the following hold: 1. If 𝑄 = 𝑄1 ⊎ 𝑄2 , then rfreq𝐷,Σ (𝑄) = rfreq𝐷,Σ (𝑄1 ) × rfreq𝐷,Σ (𝑄2 ). � 2. If compΣ (𝑄) ̸= ∅, and there exists a variable 𝑥 ∈ var(𝑄) such that 𝑥 ∈ pvarΣ (𝛼) for every 𝛼 ∈ compΣ (𝑄), then there is 𝐷* ⊆ 𝐷, computable in polynomial time in ||𝐷||, such that ⎛ ⎞ (︁ )︁ Σ,𝑄 ∏︁ (︀ )︀ ♯repΣ 𝐷 ∖ 𝐷 conf rfreq𝐷,Σ (𝑄) = ⎝1 − 1 − rfreq𝐷* ,Σ (𝑄𝑥↦→𝑐 ) ⎠ × . ♯repΣ (𝐷) 𝑐∈dom(𝐷) 3. If compΣ (𝑄) ̸= ∅, and there exists an atom 𝛼 = 𝑅(𝑡¯) ∈ compΣ (𝑄) such that pvarΣ (𝛼) = ∅, and a variable 𝑥 occurs in 𝛼 at a position of {(𝑅, ∑︀𝐴) | 𝐴 ∈ 𝑌 }, where 𝑅 : 𝑋 → 𝑌 is the primary FD of Σ𝑅 w.r.t. 𝑄, then rfreq𝐷,Σ (𝑄) = 𝑐∈dom(𝐷) rfreq𝐷,Σ (𝑄𝑥↦→𝑐 ). Item (1) of Theorem 8 is straightforward, and implies that when 𝑄 = 𝑄1 ⊎ 𝑄2 , one can compute the relative frequency of 𝑄 in polynomial time, if the same can be done for the relative frequencies of 𝑄1 and 𝑄2 . Items (2) and (3) are much more involved, and we refer the reader to the full version of the paper [5]. The main message is that when a SJFCQ 𝑄 satisfies the conditions of item (2), then its relative frequency is polynomial-time computable if so is for the relative frequency of 𝑄𝑥↦→𝑐 , for each constant 𝑐 of the database, over a certain easily computable subset 𝐷* of 𝐷, where 𝑥 is as defined in item (2). Similarly, when 𝑄 satisfies the conditions of item (3), then its relative frequency is polynomial-time computable, if so is for the relative frequency of 𝑄𝑥↦→𝑐 , for each constant 𝑐 of the database, where 𝑥 is as defined in item (3). For a set Σ of FDs with an LHS chain and an SJFCQ 𝑄, we say that 𝑄 is Σ-safe if either its complex part is empty, i.e., compΣ (𝑄) = ∅, or after recursively applying the conditions of items (1)-(3) of Theorem 8 to 𝑄, we are only left with queries having an empty complex part.2 By definition of Σ-safe queries, and by Proposition 7 and Theorem 8, we obtain the following: Theorem 9. Consider a set Σ of FDs with an LHS chain, and an SJFCQ 𝑄. If 𝑄 is Σ-safe, then RelFreq(Σ, 𝑄) is in FP. Interestingly, safety exhausts the tractable side of the dichotomy stated in Theorem 6. That is, if 𝑄 is not Σ-safe, then RelFreq(Σ, 𝑄) is ♯P-hard, as we discuss in the next section. Let us stress that checking whether 𝑄 is Σ-safe is feasible in polynomial time in ||Σ|| + ||𝑄||, which establishes the second part of Theorem 6. 3.2. The Hard Side We conclude this section by briefly discussing how the intractable side of the dichotomy stated in Theorem 6 is obtained. We know from [2] that, for every set Σ of primary keys, and an SJFCQ 𝑄 that is not Σ-safe, RelFreq(Σ, 𝑄) is ♯P-hard. Hence, with Theorem 9 in place, stating that query safety ensures tractability, we can obtain the hard side of the dichotomy by showing: Theorem 10. There is a set Σ′ of primary keys, and a non Σ′ -safe SJFCQ 𝑄′ such that, for each set Σ of FDs with an LHS chain and non Σ-safe SJFCQ 𝑄, RelFreq(Σ′ , 𝑄′ ) reduces to RelFreq(Σ, 𝑄). 2 We remark that for items (2) and (3), it is enough to recursively apply the conditions of Theorem 8 to 𝑄𝑥↦→𝑐 for only one arbitrarily chosen constant from C, and thus the notion of Σ-safety is database independent. �Theorem 10 is shown in two main steps. We first prove that there exists a set Σ* of FDs, and an SJFCQ 𝑄* such that RelFreq(Σ* , 𝑄* ) can be reduced to RelFreq(Σ, 𝑄) by means of a set of rewriting rules that extend the ones of [2]. Second, we show that there is a set Σ′ of primary keys, and an SJFCQ 𝑄′ that is not Σ′ -safe such that, for every database 𝐷, rfreq𝐷,Σ* (𝑄* ) = rfreq𝐷,Σ′ (𝑄′ ), which then implies that RelFreq(Σ′ , 𝑄′ ) reduces to RelFreq(Σ* , 𝑄* ), as needed. 4. Approximate Counting We now briefly discuss our results on approximate counting. The main contribution of this section is the following theorem; BPP is the class of decision problems that are efficiently solvable via a randomized algorithm with a bounded two-sided error [6]. Theorem 11. For every set Σ of FDs with an LHS chain (up to equivalence), and a CQ 𝑄, ♯Repairs(Σ, 𝑄) admits an FPRAS. Moreover, for Σℎ = {𝑅 : 𝐴1 → 𝐴2 , 𝑅 : 𝐴3 → 𝐴4 }, where att(𝑅) = {𝐴𝑖 | 𝑖 ∈ [4]}, ♯Repairs(Σℎ , 𝑄) admits no FPRAS, for every SJFCQ 𝑄, unless NP ⊆ BPP. To prove the approximability result we observe that ♯Repairs(Σ, 𝑄) can be seen as an in- stantiation of the so-called union of sets problem ⋃︀[7], which given a succinct representation of 𝑛 ≥ 1 sets 𝑆1 , . . . , 𝑆𝑛 , asks for the number | 𝑖∈[𝑛] 𝑆𝑖 |. Indeed, given a database 𝐷, let hom𝐷,Σ (𝑄) be the set of all homomorphic images of 𝑄 in 𝐷 that are consistent w.r.t. Σ. Formally, hom𝐷,Σ (𝑄) ⃒= {ℎ(𝑄) | ℎ is a homomorphism ⃒ from 𝑄 to 𝐷 such that ℎ(𝑄) |= Σ}. Hence, ♯repΣ,𝑄 (𝐷) = ⃒ 𝑖∈[𝑛] repΣ,𝐻𝑖 (𝐷)⃒, assuming that hom𝐷,Σ (𝑄) = {𝐻1 , . . . , 𝐻𝑛 }.3 ⃒⋃︀ ⃒ From the above discussion, and the results of [7] on the approximability of the union of sets problem, showing the existence of an FPRAS for ♯Repairs(Σ, 𝑄) boils down to showing that for each 𝐻 ∈ hom𝐷,Σ (𝑄) we can (i) compute ♯repΣ,𝐻 (𝐷), (ii) sample elements of repΣ,𝐻 (𝐷) uniformly at random, and (iii) check whether a database is in repΣ,𝐻 (𝐷), all in polynomial time in ||𝐷||. We prove that the above properties hold when Σ has an LHS chain (up to equivalence), and the first part of Theorem 11 follows. The proof of the inapproximability of ♯Repairs(Σℎ , 𝑄), for every SJFCQ 𝑄, employs a so- called gap preserving reduction from the promise problem Gap3SAT𝛾 , for some fixed 𝛾 ∈ (0, 18 ), i.e., the problem that given a Boolean formula 𝜙 in 3CNF, asks whether 𝜙 is satisfiable. The promise is that if 𝜙 is unsatisfiable, then every truth assignment makes at most 78 + 𝛾 of the clauses of 𝜙 true. For every 𝛾 ∈ (0, 81 ), Gap3SAT𝛾 is NP-complete [8]. For a fixed 𝛾 ∈ (0, 81 ), the reduction maps each formula 𝜙 to a database db(𝜙) such that the following holds: for any yes instance 𝜙𝑌 of Gap3SAT𝛾 , and any no instance 𝜙𝑁 of Gap3SAT𝛾 , the “gap” (i.e., the ratio) between the numbers ♯repΣℎ ,𝑄 (db(𝜙𝑌 )) and ♯repΣℎ ,𝑄 (db(𝜙𝑁 )) is so large that an FPRAS for ♯Repairs(Σℎ , 𝑄) could be used, by employing a small enough error 𝜖, to distinguish between yes and no instances of Gap3SAT𝛾 with high probability, and thus, placing the NP-complete problem Gap3SAT𝛾 in BPP. The Difficulty Underlying a Dichotomy. Despite our efforts, we could not get a complete approximability/inapproximability classification. However, in the full version of the paper [5] 3 By abuse of terminology, we treat each 𝐻𝑖 as a CQ consisting of facts. �we show that obtaining such a classification is as hard as solving the challenging open question whether ♯MaxMatch admits an FPRAS, i.e., the problem that given a bipartite graph 𝐺, asks for the number of maximal matchings in 𝐺; please refer to the full version for more details. 5. Conclusion The obvious problems that remain open are the following: (1) lift the dichotomy of Theo- rem 4 for FDs and self-join-free CQs to arbitrary CQs with self-joins, and (2) establish an approximability/inapproximability dichotomy for FDs and SJFCQs. We point out that approximate versions of CQA have already been considered, both in the database and ontological setting. In the database setting, to the best of our knowledge [4, 9, 10] are the only works that focus on approximations in CQA. In particular, [4] studies approximation algorithms for the relative frequency under primary keys only, which have also been experimentally evaluated in [9]. In [10], FDs are considered, but the notion of repair is based on value updates, rather than tuple deletions, as we do. In the ontological setting, different approximations have been considered. However, in this setting, approximations are understood as subsets (i.e., sound but not complete) of the standard consistent answers (e.g., see [11, 12]). References [1] M. Arenas, L. E. Bertossi, J. Chomicki, Consistent query answers in inconsistent databases, in: PODS, 1999, pp. 68–79. [2] D. Maslowski, J. Wijsen, A dichotomy in the complexity of counting database repairs, J. Comput. Syst. Sci. 79 (2013) 958–983. [3] E. Livshits, B. Kimelfeld, J. Wijsen, Counting subset repairs with functional dependencies, J. Comput. Syst. Sci. 117 (2021) 154–164. [4] M. Calautti, M. Console, A. Pieris, Counting database repairs under primary keys revisited, in: PODS, 2019, pp. 104–118. [5] M. Calautti, E. Livshits, A. Pieris, M. Schneider, Counting database repairs entailing a query: The case of functional dependencies, in: PODS (to appear), 2022, pp. 91–103. [6] S. Arora, B. Barak, Computational Complexity - A Modern Approach, Cambridge University Press, 2009. [7] R. M. Karp, M. Luby, N. Madras, Monte-carlo approximation algorithms for enumeration problems, J. Algorithms 10 (1989) 429–448. [8] J. Håstad, Some optimal inapproximability results, J. ACM 48 (2001) 798–859. [9] M. Calautti, M. Console, A. Pieris, Benchmarking approximate consistent query answering, in: PODS, 2021, pp. 233–246. [10] S. Greco, C. Molinaro, Probabilistic query answering over inconsistent databases, Ann. Math. Artif. Intell. 64 (2012). [11] S. Greco, C. Molinaro, I. Trubitsyna, Computing approximate query answers over incon- sistent knowledge bases, in: IJCAI, 2018, pp. 1838–1846. [12] T. Lukasiewicz, E. Malizia, C. Molinaro, Complexity of approximate query answering under inconsistency in datalog+/-, in: IJCAI, 2018, pp. 1921–1927. �