Efficient Computation of Aggregate Structural Joins

Kaiyang Liu

HKUST

Although several XML query-processing techniques have been proposed to do structural joins, no work has been done to handle aggregate structural join, which computes the aggregate value for two given XML element sets. One important kind of aggregate structural join is to determine the exact selectivity of a structural join, which proves to be crucial for overall path expression optimization and on-line systems. Related work has been done to estimate the selectivity of a path expression by applying summary techniques for XML data based on either tree simplification or a Markov model. However, because of either space restrictions or performance issues (for example, previous approaches may need to store the whole XML tree to answer an aggregate structural join), they are not applicable for aggregate structural joins. In this paper, we consider a modified aggregate B+-tree, which we call a XA-tree, to index the aggregate attribute values of XML elements. Since it is based on the B+-tree, the XA-tree supports query and update efficiently and places little additional burden on database administrators. Two basic algorithms are proposed to compute aggregate structural joins efficiently by utilizing the XA-tree. Furthermore, a heuristic is also proposed that helps to further improve the performance significantly. Extensive experiments have confirmed that our approaches greatly outperform competitors by an order of magnitude.