Cache-Conscious Automata for XML Filtering

Bingsheng He

HKUST

Hardware cache behavior is an important factor in the performance of memory-resident, data-intensive systems such as XML filtering engines. A key data structure in several recent XML filters is the automaton, which is used to represent the long-running XML queries in the main memory. In this paper, we study the cache performance of automaton-based XML filtering through analytical modeling and system measurement. Furthermore, we propose a cache-conscious automaton organization technique, called the hot buffer, to improve the locality of automaton state transitions. Our results show that (1) our cache performance model for XML filtering automata is highly accurate and (2) the hot buffer improves the cache performance as well as the overall performance of automaton-based XML filtering.