关联规则挖掘的相关问题研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

关联规则挖掘的相关问题研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：The Research on the Related Problems of Association Rule Mining
作者：张铁军
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：数据挖掘 ; 关联规则 ; 频繁模式 ; 频繁闭合模式 ; 最大频繁模式 ; 综合更新挖掘 ; 频繁模式树(FP-Tree) ; 兴趣度
英文关键词：Data mining ; Association rules ; Frequent pattern ; Frequent closed pattern ; Maximal frequent pattern ; Intergraded updating mining ; Frequent Pattern Tree (FP-Tree) ; Interestingness
学位年度：2009
导师：杨君锐
学科代码：081203
学位授予单位：西安科技大学
论文提交日期：2009-04-20

摘要

关联规则挖掘是数据挖掘领域中一个重要研究方向,而频繁模式挖掘又是关联规则、时序模式挖掘等应用中的关键技术和步骤。然而,由于挖掘频繁模式内在的计算复杂性,为了提高挖掘效率,业界相继提出了频繁闭合模式挖掘和最大频繁模式挖掘问题。在规模上,频繁闭合模式和最大频繁模式均小于频繁模式。同时频繁闭合模式集可以唯一地确定频繁模式完全集以及它们的准确支持度,而最大频繁模式隐含了所有的频繁模式,并且在某些数据挖掘应用中仅需挖掘出最大频繁模式;另外,在实际挖掘应用中,由于事务数据库可能发生变化,而且用户还会调整最小支持度以满足新的需要,因此如何对挖掘结果进行更新是一个值得研究的问题;再有,针对关联规则新的度量标准—兴趣度的度量方法也是业界关心的一个热点问题。因此,对这些问题进行研究具有重要意义。
     本文主要研究了关联规则挖掘中的相关问题,主要包括以下内容:
     首先,提出了用于挖掘频繁闭合模式的FCI-Miner算法,以及挖掘最大频繁模式的BFP-Miner算法。两个算法均利用改进的FP-Tree来压缩存储数据库中的事务,并充分利用该树的特点,使得在挖掘频繁闭合模式和最大频繁模式的过程中不需产生条件FP-Tree和候选模式,从而减少了挖掘过程中使用的存储空间和计算时间,实验结果表明,算法具有较好的性能。
     其次,提出了用于解决最小支持度和数据库都发生变化的综合更新挖掘最大频繁模式问题的IUMFPA算法。该算法利用完全FP-Tree并通过调整最大频繁模式进行快速最大频繁模式更新挖掘,实验测试和分析表明,该算法有较好的时空效率。
     最后,针对当前基于支持度—置信度框架挖掘关联规则时所反映的不足,提出了一种能反映项目集之间相关性和稀有性的度量标准—兴趣度,通过其可用来发现数据库中支持度低,而置信度强和紧密性高的规则。通过实例分析说明了该度量标准在一些应用中的有效性和实用性。
The association rule mining is a very important problem in data mining. The issue of mining frequent patterns plays a crucial role in association rule mining、sequential pattern mining, etc. Because of the time-consuming in mining frequent patterns, mining frequent closed patterns and mining maximal frequent patterns have been proposed to improve the mining efficiency. The set of frequent closed patterns or maximal frequent patterns is orders of magnitude smaller than the set of frequent patterns. The set of frequent closed patterns still contains enough information of the frequent patterns and its accurate support. The set of maximal frequent patterns contains all the set of the frequent patterns and there are applications where the set of maximal frequent patterns is adequate. In some applications, users may adjust the minimum support while database changed, and have to update the former mining results, so it is worth of studying in this case. Mining the interesting rules is another interesting issue. In all, it is very significative to do some researchs on those issues. In this paper, we have done some researches on the related problems of association rule mining. It is stated as follows:
     Firstly, two efficient algorithms FCI-Miner for mining frequent closed patterns and BFP-Miner for mining maximal frequent patterns are presented in this paper. The two algorithms all based on the improved FP-Tree (Frequent Pattern Tree) in order to compress and store the recorders of transaction database, and used depth-first search strategy without generating conditional FP-Trees and candidate patterns. The experimental evaluation on a number of real and synthetic databases shows that our algorithms outperform previous method in most cases.
     Secondly, a new integrated updating algorithm for mining maximal frequent patterns IUMFPA is proposed, which is aimed at handling the user adjusting the minimum support while database changes in order to find more useful maximal frequent patterns. It makes use of improved full FP-Tree structure and also utilizes the former FP-Tree and the mined results sufficiently. The experimental results indicate that IUMFPA performs efficiently.
     Finally, we propose a brief measure of rule interestingness to overcome the insufficient based on the support-confidence framework. It can determine the correlation and rarity of association rules, and especially be used to discover rules with strong correlation and high confidence, but low support. In the end, we take an example to demonstrate its effectiveness and practicality.

引文

[1] Han J, Kamber M. Data Mining: Concepts and Techniques. Morgan Kaufman Publishers, 2001
    [2] Frawley W. J, Piatetsky-Shapiro G, Matheus C. J. Knowledge Discovery in Database: An Overview, in Knowledge Discovery in Databases. Cambridge, MA: AAAI/MIT, 1991: 1～27
    [3] Agrawal R, Imielinshi T, Swami A. Mining association rules between sets of items in large database. In: Proceedings of ACM SIGMOD Conference on Management of Data, Washington, DC, 1993: 207～216
    [4] Agrawal R, Srikant R. Fast algorithm for mining association rules. In: Proceedings of the 20th International Conference on VLDB. Santiago, 1994: 487～499
    [5] Weiss S. M, Kulikowski C. A. Computer systems that learn: Classification and prediction methods from static, neural nets, machine learning, and expert systems. San Mateo, CA: Morgan Kaufman, 1991
    [6]朱明.数据挖掘.合肥:中国科学技术大学出版社,2002:13～14
    [7]黄解军,潘和平等.数据挖掘技术的应用研究.计算机工程与应用,2003(2):45～48
    [8] Melab N. Data Mining: A key contribution to E-business. Information and Communicati- ons Technology Law, 2001, 10(3): 309～318
    [9] Srikant R, Vu Q, Agrawal R. Mining association rules with item constraints. In: Proceedings of the 1997 International Conference on Data Mining and Knowledge Discovery. AAAI Press, 1997: 67～73
    [10] Bayardo R, Agrawal R, Gunopulos D. Constraint-Based rule mining on large dense data sets. In: Proceedings of the 1999 International Conference on Data Engineering. Sydney: IEEE Computer Society, 1999: 188～197
    [11]黄晓霞,萧蕴诗.数据挖掘集成技术研究.计算机应用研究,2003(4):37～39
    [12] Bezerra E, Mattoso M, Xexeo G. An analysis of the integration between data mining application and database system. In: Ebecken N, Brebbir C. A. Data MiningⅡ. Published by MIT Press, 2000, 151～160
    [13] Imielinski T, Virmani A, Abdulghani A. Discovery board application programing interf- ace and query language for database mining. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Ming, Portland, Oregon, USA, 1996
    [14] Han J, Fu Y, Wang W, et al. DMQL: A data mining query language for relationaldatabases. In: Proceedings of 1996 SIGMOD’96 Workshop Research Issues on Data Mining and Knowledge Discovery (DMKD’96). Montreal, Canada, 1996: 27～34
    [15] Grossman R, Bailey S, Ramu A, et al. The Management and Mining of Multiple Predictive Models Using the Predictive Modeling Markup Language, AFCEA’99
    [16] Kohavi R. Data mining and visualization. National Academy of Engineering USA Frontiers of Engineering, 2000
    [17] Morgan W, Chapple T. Report on data mining and data visualization. Msc Computing Visual Programming Unit, 1999
    [18]邹力鹍,王丽珍.空间数据挖掘发展研究.计算机工程与应用,2003,(11):186～188
    [19]胡军涛,武德峰,李国辉.多媒体数据挖掘的体系结构和方法.计算机工程,2003, 29(9):149～151
    [20] Qin Ding, Qiang Ding, Perrizo W. Association rule mining on remotely sensed images using P-trees. In: Proceedings of PAKDD, 2002
    [21] Agrawal R, Srikant R. Mining sequential patterns. In: Proceedings of the 11th International Conference on Data Engineering, Taipei, March 1995: 3～14
    [22] Cooley R, Mobasher B, Srivastava J. Web mining: information and pattern discovery on the World Wide Web. In: Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence (ICTA197), 1997
    [23] Spiliopoulou M, Faulstich L. C, Wilkler K. A data miner analyzing the navigateonal behavior of web users. In: Proceedings of the Workshop on Machine Learning in User Modeling of the ACAI99, Greece, 1999
    [24] Pei J, Han J, Mortazaviasl B, Zhu H. Mining access patterns efficiently from web logs. In: Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2000: 396～407
    [25]吴际,黄传河.基于数据挖掘的入侵检测系统研究.计算机工程与应用,2003 (4):166～168
    [26] Han J, Jian P, Yiwen Y. Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD International Conference Management of Data. Dallas, 2000:1～12
    [27] Park J S, Chen M S, Yu P S. An Effective Hash-Based Algorithm for Mining Associati- on Rules. In: Proceedings of ACM SIGMOD International Conference Management of Data, San Jose, CA, 1995: 175～186
    [28]黄艳,王延章,苑森森.一种高效相联规则提取算法.吉林大学学报,1999(2):36～38
    [29] Savasere A, Omiecinski E, Navathe SM. An efficient algorithm for mining associationrules. In: Proceedings of the 21st International Conference on VLDB. Zurich, 1995: 432～444
    [30] Mannila H, Toivonen H, Verkamo I. Efficient Algorithm for Discovering Association Rules. In KDD-94: AAAI Workshop on Knowledge Discovery in Database, 1994
    [31] Toivonen H. Sampling large databases for association rules. In: Proceedings of the 22nd International Conference on Very Large Databases, Bombay, India, 1996: 1～12
    [32] Lin J, Dunham M.H. Mining association rules: Anti-skew algorithm. In: International Conference on Data Engineering (ICDE), 1998
    [33]侯俊杰,李春平.一种基于模式增长的频繁模式挖掘算法.华中科技大学学报(自然科学版),2005,12:272～274
    [34]朱光喜,吴伟民,阮幼林,等.一种基于前缀树的频繁模式挖掘算法.计算机科学, 2005,32(4):34～36
    [35] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. Proceeding of the 7th International Conference on Database Theory. 1999: 398～416
    [36] Pei J, Han J, Mao R. CLOSET: An efficient algorithm for mining frequent closed itemsets. In: Proceedings of the 2000 ACM SIGMOD International Workshop on Data Mining and Knowledge Discovery. Dallas: ACM Press, 2000: 21～30
    [37] D. Burdick, M. Calimlim, J. Gehrke. Mafia: A maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 17th International Conference on Data Engineering. Heidelberg: IEEE Press, 2001: 443～452
    [38] Zaki MJ, Hsiao CJ. CHARM: An efficient algorithm for closed itemset mining. In: Proceedings of the 2nd SIAM International Conference on Data Mining. Arlington: SIAM, 2002: 12～28
    [39]刘君强,孙晓莹,庄越挺,潘云鹤.挖掘闭合模式的高性能算法.软件学报,2004,15(1):94～102
    [40]朱玉文,陈陵涛,刘万春,贾云得.基于频繁闭项目集的关联规则挖掘算法.北京理工大学学报,2003,23(3):345～349
    [41]陈俊杰,崔晓红.基于FP-Tree的频繁闭合项目集挖掘算法的研究.计算机工程与应用,2006,34:169～171
    [42]朱玉全,宋余庆.频繁闭项目集挖掘算法研究.计算机研究与发展,2007,44 (7):1177～1183
    [43] C. Lucchese. S. Orlando, R. Perego. Fast and memory efficient mining of frequent closed itemsets [J]. IEEE Transaction on Knowledge and Data Engineering. 2006, 18(1):21～36
    [44] Bayardo R. Efficiently mining long patterns from databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. New York: ACM Press, 1998: 85～93
    [45] Lin D, Kedem Z. M. Pincer-Search: a new algorithm for discovering the maximum frequent set. In: Proceedings of the 6th European Conference on Extending Database Technology. Heidelberg: Springe-Verlag, 1998: 105～119
    [46] R. Agarwal, C. Aggarwal, V. Prasad. Depth first generation of long patterns. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Boston, ACM Press, 2000, 108～118
    [47] Gouda K, Zaki M. J. Efficiently mining maximal frequent itemsets. In: Proceedings of the 2001 IEEE International Conference on Data Mining, 2001: 163～170
    [48]路松峰,卢正鼎.快速开采最大频繁项目集.软件学报,2001,12(2):293～297
    [49]宋余庆,朱玉全,孙志挥.基于FP-Tree的最大频繁项目集挖掘及更新算法.软件学报,2003,14(9):1586～1592
    [50]颜跃进,李舟军,陈火旺.基于FP-Tree有效挖掘最大频繁项集.软件学报, 2005,16(2):215～222
    [51]秦亮曦,史忠植.SFP-Max—基于排序FP-树的最大频繁模式挖掘算法.计算机研究与发展,2005,42(2):217～223
    [52] Cheung D W, Han J, et al. Maintenance of discovered association rules in large databases: an incremental updating technique. In: Proceedings of the 12th International Conference on Data Engineering, New Orleans, Louisiana, 1996: 106～114
    [53]朱玉全,孙志挥.基于频繁模式树的关联规则增量式更新算法.计算机学报, 2003,26(1):91～96
    [54]冯玉才,冯剑琳.关联规则的增量式更新算法.软件学报,1998,9(4):301～306
    [55]周海岩.关联规则的开采和更新.软件学报,1999,10(10):1079～1084
    [56]温磊,李敏强.基于有向项集图的频繁项集增量更新挖掘算法.2004中国控制与决策学术年会论文集,2004,690～693
    [57]宋中山,成林辉,吴立峰.一种基于关联规则的增量数据挖掘算法.湖北大学学报(自然科学版),2006,28(3):240～243
    [58] Tarek F. G, Mohamed T, Hamed N. An efficient technique for incremental updating of association rules. International Journal of Hybrid Intelligent Systems. 2008, 5(1): 45～53
    [59] PSichao Z, Jilian Z, PChengqi Z. An efficient algorithm for dynamic database mining.Information Science. 2007, 177(13), 2756～2767
    [60] Valtchev P, Missaoui R, Godin R. A framework for incremental generation of frequent closed itemsets. In: Proceedings of the 2002 ACM SIGMOD International Conference Management of Data. Dallas, USA, 2002(5): 1～12
    [61]战立强,刘大昕.基于概念格的频繁闭项集增量挖掘算法研究.哈尔滨工程大学学报,2007,28(2):194～197
    [62]李芸,史琰.基于频繁闭项集挖掘的增量式维护算法.计算机工程,2008,34 (3):94～97
    [63]吉根林,杨明,宋余庆.最大频繁项目集的快速更新.计算机学报,2005,28 (1):128～135
    [64]杨君锐,赵群礼.基于FP-Tree的最大频繁项目集更新挖掘算法.华中科技大学学报(自然科学版),2004,32(11):88～90
    [65] Rymon R. Search through systematic set enumeration. In: Proceedings of the 3rd International Conference on Principles of Knowledge Representation and Reasoning. 1992: 539～550
    [66] Piatetsky-Shapiro G. Discovery, analysis, and presentation of strong rules. Knowledge Discovery in Databases, AAAI/MIT Press, 1991: 229～241
    [67] Fayyad U, Piatetsky-Shapiro G, Smyth P. From data mining to knowledge discovery: An overview. In: Advances in knowledge Discovery and data Mining, California: AAAI Press, 1996: 1～36
    [68] Brin S, Motwani R, Silverstein C. Beyond market baskets: Generalizing association rules to correlations. The ACM SIGMOD96, Montreal, Canada, 1996: 255～276
    [69] Srikant R, Agrawal R. Mining generalized association rules. In: Proceedings of the 21st International Conference on Very Large Databases, Zurich, Swizerland,1995: 407～419
    [70] Savasere A, Omiecinski E, Navathe S. Mining for strong negative association rules in a large database of customer transactions. In: Proceedings of the International Conference on Data Engineering, February 1998: 494～502
    [71] Wu X, Zhang C, Zhang S. Mining both positive and negative association rules. In: the Proceedings of the 19th International Conference on Machine Learning (ICML–2002), The University of New South Wales, Sydney, Australia, 2002: 658～665
    [72] Zhang C, Zhang S. Association rules mining. LNAI 2307, Springer-Verlag Berlin Heidelberg, 2002: 47～84
    [73]周欣,沙朝锋,朱扬勇等.兴趣度—关联规则的又一个阈值.计算机研究与发展,2000,37(5):627～633
    [74]周浩峰,朱扬勇,施伯乐.一个基于兴趣度的关联规则采掘算法.计算机研究与发展,2002,39(4):450～457
    [75] P. Tan, V. Kumar, and J. Srivastava. Selecting the right interestingness measure for association patterns. Technical Report 2002-112, Army High Performance Computing Research Center, 2002
    [76] Hilderman, R, Hamilton, H, 2003. Measuring the interestingness of discovered knowle- dge: A principled approach. Intelligent Data Analysis 7 (4), 347～382
    [77] Blanchard, J, Guillet, F, Briand, H, Gras, R, May 2005. Assessing the interestingness of rules with a probabilistic measure of deviation from equilibrium. In: The XIth International Symposium on Applied Stochastic Models and Data Analysis. Brest, France, 191～200
    [78] Suzuki, E. Data mining methods for discovering interesting exceptions from an unsu- pervised table. Journal of Universal Computer Science. 2006, 12(6), 627～653
    [79] R. Hilderman, H. Hamilton. Evaluation of interestingness measures for ranking discovered knowledge. In: Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Hong Kong, 2001(4): 247～259
    [80] M. Kamber, R. Shinghal. Evaluation the interestingness of characteristic rules. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland, 1996: 263～266
    [81] A. A. Freitas. On rule interestingness measures. Knowledge Based Systems. 1999(12), 303～315
    [82] R. Hilderman, H. Hmamilton, B. Barber. Ranking the interestingness of summaries fro- m data mining systems. In: Proceedings of the 12th International Florida Artificial Intelligence Research Symposium. Orlando, 1999(5), 100～106
    [83] A. Silberschatz, A. Tuzhilin. What makes patterns interesting in knowledge discovery systems. IEEE Transactions on Knowledge and Data Engineering. 1996, 8(6), 970～974
    [84] Symth P, Goodman R M. An information theoretic approach to rule induction from database. IEEE Transactions of Knowledge and Data Engineering. 1992, 4 (4): 301～316
    [85]程继华,郭建生,施鹏飞.挖掘所关注规则的多策略方法研究.计算机学报, 2000,23(1):47～51
    [86]赵群礼.关联规则数据挖掘方法的研究:[硕士学位论文].西安:西安科技大学,2005
    [87]温磊.基于有向集图的关联规则挖掘算法研究与应用:[博士学位论文].天津:天津大学,2003
    [88]王卉.最大频繁项集挖掘算法及应用研究:[博士学位论文].武汉:华中科技大学,2004
    [89]沈斌.关联规则相关技术研究:[博士学位论文].杭州:浙江大学,2007

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700