Objectively evaluating condensed representations and interestingness measures for frequent itemset mining

设为首页

收藏本站

网站地图 | English | 公务邮箱

NSTL服务站

详细信息查看全文

作者：Albrecht Zimmermann
关键词：Result verification ; Data generation ; Interestingness measures
刊名：Journal of Intelligent Information Systems
出版年：2015
出版时间：December 2015
年：2015
卷：45
期：3
页码：299-317
全文大小：779 KB
参考文献：Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In 20th VLDB (pp. 487鈥?99). Chile: Morgan Kaufmann.
Bayardo, R.J. Jr., Goethals, B., Zaki, M.J. (Eds.) (2004). FIMI 04, proceedings of the IEEE ICDM workshop on FIM implementations. Brighton.
Bie, T.D. (2011). Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Mining and Knowledge Discovery, 23(3), 407鈥?46.MATH MathSciNet CrossRef
Blake, C., & Merz, C. (1998). UCI repository of machine learning databases. http://鈥媤ww.鈥媔cs.鈥媢ci.鈥媏du/鈥媘learn/鈥婱LRepository.鈥媓tml .
Blanchard, J., Guillet, F., Gras, R., Briand, H. (2005). Using information-theoretic measures to assess association rule interestingness. In J. Han, B.W. Wah, V. Raghavan, X. Wu, R. Rastogi (Eds.), ICDM (pp. 66鈥?3). Houston: IEEE.
Boulicaut, J.F., & Jeudy, B. (2001). Mining free itemsets under constraints. In M.E. Adiba, C. Collet, B.C. Desai (Eds.), IDEAS 鈥?1 (pp. 322鈥?29).
Brin, S., Motwani, R., Silverstein, C. (1997). Beyond market baskets: generalizing association rules to correlations. In J. Peckham (Ed.), (pp. 265鈥?76).
Carvalho, D., Freitas, A., Ebecken, N. (2005). Evaluating the correlation between objective rule interestingness measures and real human interest. In A. Jorge, L. Torgo, P. Brazdil, R. Camacho, J. Gama (Eds.), PKDD (pp. 453鈥?61). Springer.
Cooper, C., & Zito, M. (2007). Realistic synthetic data for testing association rule mining algorithms for market basket databases. In J.N. Kok, J. Koronacki, R.L. de M谩ntaras, S. Matwin, D. Mladenic, A. Skowron (Eds.), PKDD (pp. 398鈥?05). Springer.
Gouda, K., & Zaki, M.J. (2005). Genmax: an efficient algorithm for mining maximal frequent itemsets. Data Mining and Knowledge Discovery, 11(3), 223鈥?42.MathSciNet CrossRef
Han, J., Pei, J., Yin, Y. (2000). Mining frequent patterns without candidate generation. In W. Chen, J.F. Naughton, P.A. Bernstein (Eds.), SIGMOD conference (pp. 1鈥?2). ACM.
Heikinheimo, H., Sepp盲nen, J.K., Hinkkanen, E., Mannila, H., Mielik盲inen, T. (2007). Finding low-entropy sets and trees from binary data. In P. Berkhin, R. Caruana, X. Wu (Eds.), KDD (pp. 350鈥?59). ACM.
Lenca, P., Meyer, P., Vaillant, B., Lallich, S. (2008). On selecting interestingness measures for association rules: user oriented description and multiple criteria decision aid. European Journal of Operational Research, 184(2), 610鈥?26.MATH CrossRef
Mampaey, M., & Vreeken, J. (2013). Summarizing categorical data by clustering attributes. Data Mining and Knowledge Discovery, 26(1), 130鈥?73.MATH MathSciNet CrossRef
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. In C. Beeri, P. Buneman (Eds.), ICDT (pp. 398鈥?16). Springer.
Peckham, J., & (Ed.) (1997). SIGMOD 1997, May 13鈥?5. Tucson: ACM Press.
Pei, J., Han, J., Mao, R. (2000). Closet: an efficient algorithm for mining frequent closed itemsets. In ACM SIGMOD workshop on research issues in data mining and knowledge discovery (pp. 21鈥?0).
Pei, Y., & Za茂ane, O. (2006). A synthetic data generator for clustering and outlier analysis. Tech. rep.
Piatetsky-Shapiro, G. (1991). Discovery, analysis, and presentation of strong rules. In Knowledge discovery in databases (pp. 229鈥?48). AAAI/MIT Press.
Ramesh, G., Zaki, M.J., Maniatty, W. (2005). Distribution-based synthetic database generation techniques for itemset mining. In IDEAS (pp. 307鈥?16). IEEE.
Tan, P.N., Kumar, V., Srivastava, J. (2002). Selecting the right interestingness measure for association patterns. In KDD (pp. 32鈥?1). ACM.
Vaillant, B., Lenca, P., Lallich, S. (2004). A clustering of interestingness measures. In E. Suzuki, S. Arikawa (Eds.), Discovery science (pp. 290鈥?97). Springer.
Vreeken, J., van Leeuwen, M., Siebes, A. (2007). Preserving privacy through data generation. In N. Ramakrishnan, O. Zaiane (Eds.), ICDM (pp. 685鈥?90). IEEE.
Wu, T., Chen, Y., Han, J. (2010). Re-examination of interestingness measures in pattern mining: a unified framework. Data Mining and Knowledge Discovery, 21(3), 371鈥?97.MathSciNet CrossRef
Zaki, M.J. (2000). Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 12(3), 372鈥?90.MathSciNet CrossRef
Zaki, M.J., & Hsiao, C.J. (1999). Charm: an efficient algorithm for closed association rule mining. Tech. rep., CS Department, Rensselaer Polytech Institute.
Zaki, M.J., & Hsiao, C.J. (2002). Charm: an efficient algorithm for closed itemset mining. In Grossman, Han, Kumar, Mannila, Motwani (Eds.), SDM. SIAM.
Zheng, Z., Kohavi, R., Mason, L. (2001). Real world performance of association rule algorithms. In KDD (pp. 401鈥?06).
作者单位：Albrecht Zimmermann (1)

1. KU Leuven, Celestijnenlaan 200A, 3001, Leuven, Belgium
刊物类别：Computer Science
刊物主题：Data Structures, Cryptology and Information Theory
Artificial Intelligence and Robotics
Document Preparation and Text Processing
Business Information Systems
出版者：Springer Netherlands
ISSN：1573-7675

文摘

Itemset mining approaches, while having been studied for more than 15 years, have been evaluated only on a handful of data sets. In particular, they have never been evaluated on data sets for which the ground truth was known. Thus, it is currently unknown whether itemset mining techniques actually recover underlying patterns. Since the weakness of the algorithmically attractive support/confidence framework became apparent early on, a number of interestingness measures have been proposed. Their utility, however, has not been evaluated, except for attempts to establish congruence with expert opinions. Using an extension of the Quest generator proposed in the original itemset mining paper, we propose to evaluate these measures objectively for the first time, showing how many non-relevant patterns slip through the cracks. Keywords Result verification Data generation Interestingness measures

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700