CFP-tree: Storing and Querying Frequent Itemsets

Guimei Liu

HKUST

Extensive efforts have been devoted to developing efficient algorithms for mining frequent patterns. However, frequent pattern mining remains a time-consuming process, especially for very large datasets. It is therefore desirable to adopt a ``mining once and using many times" strategy. Unfortunately, there has been little work reported on managing and organizing a large set of patterns for future use. In this work, we propose a disk-based data structure, CFP-tree (Condensed Frequent Pattern Tree), for organizing frequent patterns discovered from transactional databases. In addition to an efficient algorithm for CFP-tree construction, we also developed algorithms to efficiently support two important types of queries, namely queries with minimum support constraints and queries with item constraints, against the stored patterns, as these two types of queries are basic building blocks for complex frequent pattern related mining tasks. Comprehensive experimental study has been conducted to demonstrate the effectiveness of CFP-tree and efficiency of related algorithms.