Efficient Ontology-Mediated Query Answering: Extending DL-liteR and Linear and CH
Efficient Ontology-Mediated Query Answering: Extending DL-liteR and Linear and CH
Abstract
The OWL two QL profile of the OWL two Web Ontology Language, based on the family of description logics called DL-Lite, is designed so that data stored in a standard relational database system can be queried through an ontology via a rewriting mechanism, i.e., by rewriting the query into an SQL query that is then answered by the RDBMS system, without any changes to the data. In this paper we propose a language whose expressive power goes beyond that of DL-Lite while still allowing query answering via rewriting of queries into unions of conjunctive two-way regular path queries instead of SQL queries. Our language is an extension of both OWL two QL and linear and CH: OWL two QL is extended by allowing qualified existential quantification on the left-hand side of concept inclusion axioms, and linear and CH by allowing inverses in role inclusion axioms. We identify a syntactic property of the extended language that guarantees union of conjunctive two-way regular path query rewritability. We propose a novel rewriting technique for conjunctive queries under our ontology language that makes use of nondeterministic finite state automata. We show that conjunctive query answering in our setting is NLOGSPACE-complete with respect to data complexity and NP-complete for combined complexity; we also show that answering instance queries is NLOGSPACE-complete for data complexity and in PTIME for combined complexity.
One. Introduction
One. Introduction
Ontologies have been successfully employed in the conceptual modelling of data in several areas, particularly in Information Integration and the Semantic Web. An ontology is a specification of the domain of interest of an application, and can be specified using logical rules which, on the one hand, restrict the form of the underlying data, and on the other hand allow for inference of information that is not explicitly contained in the data. Description Logic is a family of knowledge representation formalisms that are able to capture a wide range of ontological constructs. Description Logics are based on concepts (unary predicates representing classes of individuals) and roles (binary predicates representing relations between classes). A Description Logic knowledge base consists of a TBox (the terminological component) and an ABox (the assertional component). The former is a conceptual representation of the schema, while the latter is an instance of the schema.
A common assumption in this context is the so-called open-world assumption, namely that the information in the ABox is sound but not complete; the TBox, in particular, specifies how the ABox can be expanded with additional information in order to answer queries. Answers to a query in this context are called certain answers, as they correspond to the answers that are true in all models of the theory constituted by the knowledge base. The set of all models is represented by the so-called expansion (or chase) of an ABox A according to a TBox T. Note that neither each model nor the set of all models is necessarily finite. The expansion (chase) is illustrated in the following example.
Example one. Consider the TBox T comprising the assertions Parent Person and Person has.Parent, where Person and Parent are concepts. The first assertion states that every individual in the class Parent is also in the class Person. In the second assertion, the concept Ehas.Parent denotes the individuals connected via the role has to some individual belonging to the concept Parent; in other words, it contains all x such that has(x, y) and Parent(y) for some y. Thus, the second assertion states that every individual in the class Person is also in the class of individuals who have a parent. Now suppose we have the ABox A equals {Person(alice)}; we can expand A according to the TBox T so as to add to it all atoms entailed by T and A; we therefore add has(alice,z0) and Parent(z0), where z0 is a so-called labelled null, that is, a placeholder for an unknown value of which we know the existence (note that, with this approach, A can be expanded further). Given the query q defined as q(x) less than has(x,y), the answer to q under T and A is {alice} because has(alice, z0) is entailed by T and A; in fact, the certain answers to q are obtained by evaluating q on the expansion and by considering answers that do not contain nulls. If we consider the query q1 defined as q1(x) plus Parent(x), the answer is empty because z0, though known to exist, is not known.
Answers to queries over Description Logic knowledge bases can be computed, for certain languages, by query rewriting. In query rewriting, a new query q' is computed (rewritten) from the given query q according to the TBox T, such that the answers to q on K equals T and A are obtained by evaluating q' on A, where A is seen as a database; it is said that q is rewritten into q' and that q' is the perfect rewriting of q with respect to T. The language of q', called the target language, can be more expressive than that of q. Query rewriting has been extensively employed in query answering over ontologies. A common rewriting technique for Description Logics and other knowledge representation formalisms, inspired by resolution in Logic Programming, has as the target language unions of conjunctive queries.
Example two. Let us consider again the knowledge base of Example one. The perfect rewriting of query q is the query q' defined as q(x) less than Person(x) U has(x, y); intuitively, q' captures the fact that, to search for individuals which are connected via the role has to some other individual, we need also to consider individuals in Person, because the TBox might infer the former from the latter. The evaluation of q' on A returns the correct (i.e., certain) answers.
The OWL two QL profile of the OWL two Web Ontology Language - which is based on the description logic DL-LiteR - is expressly designed so that query answering can be performed via query rewriting. Data (assertions) that are stored in a standard relational database can be queried through an ontology by rewriting the query into an SQL query that is then answered by the RDBMS, without any changes to the data.
Extending the expressiveness of DL-LiteR may lead to the need for a more expressive target language than SQL, i.e. than first-order queries. This occurs, for example, when qualified existential quantification is allowed on the left-hand side of axioms, i.e., formulae of the form E.R.D where R is a role and D a concept. In cases such as this, we say that the language is not first-order rewritable. The following example illustrates this issue.
Example three. Consider the TBox T equals hasParent.Person and the query defined as q of x less than Person of x. Note that an expression of the form E hasParent.Person is forbidden on the left-hand sides of axioms in DL-LiteR. It is easy to see that the query rewriting technique described earlier produces an infinite union of conjunctive queries: q of x less than Person of x, q of x less than hasParent of x,y, Person of y and all conjunctive queries of the form q of x less than hasParent of x,y1, hasParent of yk,yk plus one, Person of yk plus one, with k greater than or equal to one. This cannot be captured by an FO-rewriting.
However, by adopting the semantic web query language SPARQL one point one, database systems should be able to answer queries that are more expressive than FO queries since the property paths of SPARQL one point one are able to express navigational queries by defining regular expressions on predicates. In particular, every conjunctive two-way regular path query, as well as unions of conjunctive two-way regular path queries, can be translated to a SPARQL one point one query. Building on this, in this paper we propose a language that extends DL-LiteR but still allows query answering via a simple rewriting mechanism, with unions of conjunctive two-way regular path queries instead of SQL queries as the target language. We allow qualified existential quantification on the left-hand sides of axioms and identify a property of the resulting language that allows a rewriting into unions of conjunctive two-way regular path queries. The description logic resulting from this extension, which we call harmless linear ECHI, denoted by CHILin, is a generalisation of both DL-LiteR and linear CHI.
Example four. Recall the issue in the previous example, where a finite FO-rewriting was not feasible. In order to capture the infinite FO-rewriting, we can produce a rewriting into a conjunctive two-way regular path query q' defined as q of x less than hasParent star of x,y, Person of y, where hasParent star is a regular expression denoting all finite compositions of hasParent with itself, i.e.,
hasParent star of x, y equals hasParent of x, y union hasParent of x,z1, hasParent of yk,yk plus one,
for all i greater than or equal to one.
Contributions. This paper significantly extends earlier work where we first proposed exploiting the capabilities of navigational queries in order to allow rewriting of conjunctive queries into conjunctive two-way regular path queries under a more restrictive description logic, namely linear CHI. We also give here a complete theoretical development and full proofs. In more detail, the contributions are the following:
We define CHILin, harmless linear ECHI, an ontology language that generalises both DL-LiteR and linear ECH.
We show that instance queries, queries with a single atom in their body, under CHILin knowledge bases can be rewritten to two-way regular path queries, using an algorithm based on non-deterministic finite-state automata.
We show that conjunctive queries under CHILin knowledge bases can be rewritten to unions of conjunctive two-way regular path queries. This algorithm combines the tree-witness rewriting with the above rewriting technique for instance queries. Since unions of conjunctive two-way regular path queries can be straightforwardly expressed in SPARQL one point one by means of property paths, our approach is therefore directly applicable to real-world querying settings.
We undertake a complexity analysis for query answering under CHILin. We analyse the computational cost of query answering in terms of both data complexity, where the TBox and the query are fixed and the ABox alone is the input, and combined complexity, where the query, TBox and ABox all constitute the input. We show that answering instance queries under CHILin is NLOGSPACE-complete for data complexity and in PTIME for combined complexity; we also show that answering conjunctive queries under CHILin is NLOGSPACE-complete for data complexity and NP-complete for combined complexity.
We formally prove the correctness of our algorithms, and also that they comply with the upper complexity bounds.