Abstract: Given a query workload, a database and a set of constraints,
the view-selection problem is to select views to materialize so that the
constraints are satisfied and the views can be used to compute the queries in
the workload efficiently. A typical constraint, which we consider in the present
work, is to require that the views can be stored in a given amount of disk space.
Depending on features of SQL queries (e.g., the DISTINCT keyword) and on whether
the database relations on which the queries are applied are sets or bags, the
queries may be computed under set semantics, bag-set semantics, or bag semantics.
In this paper we study the complexity of the view-selection problem under these
semantics. We show that bag semantics is the "easiest to handle" (we show that
in this case the decision version of view selection is in NP), whereas under set
and bag-set semantics we assume further restrictions on the query workload (we
only allow queries without self-joins in the workload) to achieve the same
complexity. Moreover, while under bag and bag-set semantics filtering views
(i.e., subgoals that can be dropped from the rewriting without impacting
equivalence to the query) are practically not needed, under set semantics
filtering views can reduce significantly the query-evaluation costs. We show
that under set semantics the decision version of the view-selection problem
remains in NP only if filtering views are not allowed in the rewritings. Finally,
we investigate whether the cgalg algorithm for view selection introduced
in (Chirkova & Genesereth 2000) is suitable in our setting. We prove that this
algorithm is sound for all cases we examine here, and that it is complete under
bag semantics for workloads of arbitrary conjunctive queries and under bag-set
semantics for workloads of conjunctive queries without self-joins.
Keywords:
Data integration, query transformation, database theory.