Mind Your Dependencies for Semantic Query Optimization
Semantic query optimization uses dependencies between attributes to formulate query transformations and revise the number of processed rows, with direct impact on performance. Commercial databases present facilities to define dependencies as not enforced constraints. The goal is to help the query optimizer in cases where the database is denormalized or simply lost dependencies in the design. However, feeding these facilities is a manual task which is tedious and error-prone. An attractive alternative is the automatic discovery of dependencies, but the cost of finding dependencies increases with the number of rows and attributes in the dataset. In this paper, we stick to the automatic discovery approach, but to reduce the cost we focus on dependencies matching the current queries in the pipe (ie., workload). Initially, we rely on a large set of functional dependencies computed in batch with state of the art algorithms in the literature. Over time our focused dependency selector (FDSel) chooses exemplars to feed the query optimizer. Therewith we eliminate further manual interactions. The automatically selected exemplars exhibit statistical properties that resemble those of the initial dependency set. This demonstrates the effectiveness of our proposed approach. In the best case scenario, by applying the FDSel for join elimination on a real-world database, we reduce query response time by more than one order of magnitude.