Containment Join Size Estimation: Models and Methods

Wei Wang

HKUST

Recent years witnessed an increasing interest in researches in XML, partly due to the fact that XML has now become the de facto standard for data interchange over the internet. A large amount of work has been reported on XML storage models and query processing techniques. However, few works have addressed issues of XML query optimization. In this paper, we report our study on one of the challenges in XML query optimization: containment join size estimation. Containment join is well accepted as an important operation in XML query processing. Estimating the size of its results is no doubt essential to generate efficient XML query processing plans. We propose two models, the interval model and the position model, and a set of estimation methods based on these two models. Comprehensive performance studies were conducted. The results not only demonstrate the advantages of our new algorithms over existing algorithms, but also provide valuable insights into the tradeoff among various parameters.