XR-Tree: Indexing XML Data for Efficient Structural Join

Haifeng Jiang

Department of Computer Science, HKUST

XML documents are typically queried with a combination of value search and structure search. While querying by values can leverage traditional database technologies, evaluating structural relationships, specifically parent-child or ancestor-descendant relationship, between XML elements has imposed a great challenge to efficient XML query processing.

In this paper, we propose XR-tree, namely, XML Region Tree, which is a dynamic external memory index structure specially designed for strictly nested XML data. By taking advantage of numbering schemes proposed for XML, XR-tree indexes element nodes with their region codes, i.e. (start,end) pairs. The unique feature of XR-tree is that, for a given element e, all its ancestors(or descendants) in an element set E indexed by it can be identified with optimal worst case I/O cost. We then propose a new structural join algorithm that can efficiently evaluate the structural relationships between two XR-tree indexed element sets by most effectively skipping both ancestors and descendants that do not participate in a join. An extensive performance study was conducted and the experimental results show that the XR-tree based join algorithm provides significant improvement gains over the current state of the art.