用户名: 密码: 验证码:
Objectively evaluating condensed representations and interestingness measures for frequent itemset mining
详细信息    查看全文
  • 作者:Albrecht Zimmermann
  • 关键词:Result verification ; Data generation ; Interestingness measures
  • 刊名:Journal of Intelligent Information Systems
  • 出版年:2015
  • 出版时间:December 2015
  • 年:2015
  • 卷:45
  • 期:3
  • 页码:299-317
  • 全文大小:779 KB
  • 参考文献:Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In 20th VLDB (pp. 487鈥?99). Chile: Morgan Kaufmann.
    Bayardo, R.J. Jr., Goethals, B., Zaki, M.J. (Eds.) (2004). FIMI 04, proceedings of the IEEE ICDM workshop on FIM implementations. Brighton.
    Bie, T.D. (2011). Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Mining and Knowledge Discovery, 23(3), 407鈥?46.MATH MathSciNet CrossRef
    Blake, C., & Merz, C. (1998). UCI repository of machine learning databases. http://鈥媤ww.鈥媔cs.鈥媢ci.鈥媏du/鈥媘learn/鈥婱LRepository.鈥媓tml .
    Blanchard, J., Guillet, F., Gras, R., Briand, H. (2005). Using information-theoretic measures to assess association rule interestingness. In J. Han, B.W. Wah, V. Raghavan, X. Wu, R. Rastogi (Eds.), ICDM (pp. 66鈥?3). Houston: IEEE.
    Boulicaut, J.F., & Jeudy, B. (2001). Mining free itemsets under constraints. In M.E. Adiba, C. Collet, B.C. Desai (Eds.), IDEAS 鈥?1 (pp. 322鈥?29).
    Brin, S., Motwani, R., Silverstein, C. (1997). Beyond market baskets: generalizing association rules to correlations. In J. Peckham (Ed.), (pp. 265鈥?76).
    Carvalho, D., Freitas, A., Ebecken, N. (2005). Evaluating the correlation between objective rule interestingness measures and real human interest. In A. Jorge, L. Torgo, P. Brazdil, R. Camacho, J. Gama (Eds.), PKDD (pp. 453鈥?61). Springer.
    Cooper, C., & Zito, M. (2007). Realistic synthetic data for testing association rule mining algorithms for market basket databases. In J.N. Kok, J. Koronacki, R.L. de M谩ntaras, S. Matwin, D. Mladenic, A. Skowron (Eds.), PKDD (pp. 398鈥?05). Springer.
    Gouda, K., & Zaki, M.J. (2005). Genmax: an efficient algorithm for mining maximal frequent itemsets. Data Mining and Knowledge Discovery, 11(3), 223鈥?42.MathSciNet CrossRef
    Han, J., Pei, J., Yin, Y. (2000). Mining frequent patterns without candidate generation. In W. Chen, J.F. Naughton, P.A. Bernstein (Eds.), SIGMOD conference (pp. 1鈥?2). ACM.
    Heikinheimo, H., Sepp盲nen, J.K., Hinkkanen, E., Mannila, H., Mielik盲inen, T. (2007). Finding low-entropy sets and trees from binary data. In P. Berkhin, R. Caruana, X. Wu (Eds.), KDD (pp. 350鈥?59). ACM.
    Lenca, P., Meyer, P., Vaillant, B., Lallich, S. (2008). On selecting interestingness measures for association rules: user oriented description and multiple criteria decision aid. European Journal of Operational Research, 184(2), 610鈥?26.MATH CrossRef
    Mampaey, M., & Vreeken, J. (2013). Summarizing categorical data by clustering attributes. Data Mining and Knowledge Discovery, 26(1), 130鈥?73.MATH MathSciNet CrossRef
    Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. In C. Beeri, P. Buneman (Eds.), ICDT (pp. 398鈥?16). Springer.
    Peckham, J., & (Ed.) (1997). SIGMOD 1997, May 13鈥?5. Tucson: ACM Press.
    Pei, J., Han, J., Mao, R. (2000). Closet: an efficient algorithm for mining frequent closed itemsets. In ACM SIGMOD workshop on research issues in data mining and knowledge discovery (pp. 21鈥?0).
    Pei, Y., & Za茂ane, O. (2006). A synthetic data generator for clustering and outlier analysis. Tech. rep.
    Piatetsky-Shapiro, G. (1991). Discovery, analysis, and presentation of strong rules. In Knowledge discovery in databases (pp. 229鈥?48). AAAI/MIT Press.
    Ramesh, G., Zaki, M.J., Maniatty, W. (2005). Distribution-based synthetic database generation techniques for itemset mining. In IDEAS (pp. 307鈥?16). IEEE.
    Tan, P.N., Kumar, V., Srivastava, J. (2002). Selecting the right interestingness measure for association patterns. In KDD (pp. 32鈥?1). ACM.
    Vaillant, B., Lenca, P., Lallich, S. (2004). A clustering of interestingness measures. In E. Suzuki, S. Arikawa (Eds.), Discovery science (pp. 290鈥?97). Springer.
    Vreeken, J., van Leeuwen, M., Siebes, A. (2007). Preserving privacy through data generation. In N. Ramakrishnan, O. Zaiane (Eds.), ICDM (pp. 685鈥?90). IEEE.
    Wu, T., Chen, Y., Han, J. (2010). Re-examination of interestingness measures in pattern mining: a unified framework. Data Mining and Knowledge Discovery, 21(3), 371鈥?97.MathSciNet CrossRef
    Zaki, M.J. (2000). Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 12(3), 372鈥?90.MathSciNet CrossRef
    Zaki, M.J., & Hsiao, C.J. (1999). Charm: an efficient algorithm for closed association rule mining. Tech. rep., CS Department, Rensselaer Polytech Institute.
    Zaki, M.J., & Hsiao, C.J. (2002). Charm: an efficient algorithm for closed itemset mining. In Grossman, Han, Kumar, Mannila, Motwani (Eds.), SDM. SIAM.
    Zheng, Z., Kohavi, R., Mason, L. (2001). Real world performance of association rule algorithms. In KDD (pp. 401鈥?06).
  • 作者单位:Albrecht Zimmermann (1)

    1. KU Leuven, Celestijnenlaan 200A, 3001, Leuven, Belgium
  • 刊物类别:Computer Science
  • 刊物主题:Data Structures, Cryptology and Information Theory
    Artificial Intelligence and Robotics
    Document Preparation and Text Processing
    Business Information Systems
  • 出版者:Springer Netherlands
  • ISSN:1573-7675
文摘
Itemset mining approaches, while having been studied for more than 15 years, have been evaluated only on a handful of data sets. In particular, they have never been evaluated on data sets for which the ground truth was known. Thus, it is currently unknown whether itemset mining techniques actually recover underlying patterns. Since the weakness of the algorithmically attractive support/confidence framework became apparent early on, a number of interestingness measures have been proposed. Their utility, however, has not been evaluated, except for attempts to establish congruence with expert opinions. Using an extension of the Quest generator proposed in the original itemset mining paper, we propose to evaluate these measures objectively for the first time, showing how many non-relevant patterns slip through the cracks. Keywords Result verification Data generation Interestingness measures

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700