多维多层关联规则算法研究

英文题名：The Research of Multi-dimensional and Multi-level Association Rule Algorithm
作者：绳英英
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：数据挖掘 ; 关联规则 ; 多维 ; 多层 ; Apriori算法 ; 频繁项集
英文关键词：Data Mining ; Association rules ; Multidimensional ; Multiple-Level ; Apriori arithmetic ; Frequent Itemsets
学位年度：2010
导师：闫仁武
学科代码：081203
学位授予单位：江苏科技大学
论文提交日期：2009-12-21

摘要

随着数据库规模的日益扩大和数据挖掘技术的繁荣发展,关联规则技术也得到了蓬勃的发展,并正朝更为广泛和深入的方向继续发展。关联规则挖掘算法是关联规则挖掘研究的主要内容。提高关联规则的效率关键是提高关联规则算法的效率。Apriori算法是一种最有影响的挖掘单维布尔型关联规则频繁项集的算法。Apriori算法存在两大瓶颈问题:一是候选项目集的数量,二是事务数据库的扫描次数。同时Apriori算法是单维布尔型的。与经典的关联规则研究相比,目前的主要研究内容已经从单维单层次扩展到多维多层次的挖掘。运用抽象层次的概念,可能会发现新的更为抽象的规则。在实际应用中,应该从不同的角度不同的层面上进行挖掘,这种条件下产生的强关联规则对人们来说更有用。因为现在的数据多是以多维的形式存在,并且存放在关系数据库中。因此本文主要是把单维布尔型关联规则算法Apriori算法扩展到多维多层关系数据挖掘上去。
     本文在对数据挖掘及关联规则技术深入细致研究的基础上做了以下工作:
     (1)分析了关联规则的经典算法Apriori算法,包括算法思想、算法的主要步骤及算法伪码,并分析了其存在的问题,列出了一些提高Apriori有效性的方法。
     (2)在充分消化吸收经典Apriori算法的基础上提出了改进的算法,改进后的算法是适于挖掘多维关系数据的。主要描述了改进后算法的思想、算法的伪码及算法的理论正确性分析。
     (3)对改进后的算法的性能与Apriori算法的性能进行了比较试验,实验结果证明了改进后的算法在多维度等方面的优越性。
     在本文的最后,进行了文章总结和进一步工作的展望。
With the growing size of the database and prosperity of data mining technology, association rules technology also has been an explosion of development and are moving in the direction of more extensive and in-depth. Association rule mining algorithm for mining association rules is the main content of the study. The key to improve the efficiency of association rules is to improve the efficiency of association rules algorithm. Apriori algorithm is one of the most influential single-Jorgen Weibull-based association rules mining frequent itemsets algorithm. Apriori algorithm has two major bottlenecks exist: first is the number of candidate itemsets, and second is the number of scanning the transaction database. At the same time Apriori algorithm is a single-Jorgen Weibull type. Study of the classical association rules have been expanded from a one-dimensional single-level to multi-dimensional and multi-level mining. The concept of using an abstract level, you may find new, more abstract rules. In practice, it should be from different angles at different levels on the excavation, under these conditions strong association rules generated by the people to be more useful. Because many data are stored as the form of multi-dimensional data in a relational database. Therefore, this paper is to extend single-Jorgen Weibull-type algorithm Apriori algorithm for association rules to multi-dimensional multi-relational data mining up.
     On the basis of intensive research on association rules of data mining techniques, this paper has to do the following work:
     (1) Analysis of the classical algorithm Apriori association rules algorithm, including the method of thought, the algorithm main steps and algorithm pseudo-code, and analyzes its problems, listing some ways to increase the effectiveness of the method of Apriori.
     (2) In the full absorption of the classic Apriori algorithm has proposed the improved algorithm, the improved algorithm is suitable for multi-dimensional relational data mining. Mainly describes the improved algorithm, the idea of the algorithm, pseudo-code and algorithm correctness of the theoretical analysis.
     (3) Compared test has been used between improved performance of the algorithm and the Apriori algorithm. The result shows the advantages of the improved algorithm in multi-dimensional .
     Finally, this paper summarizes and discusses the further work.

引文

[1] Jiawei Han, Micheline Kamber.数据挖掘概念与技术[M].机械工业出版社, 2005.8.
    [2] GTS Ho. An intelligent information infrastructure to support the streamlining of integrated logistics workflow[J]. Expert Systems, 2004, 21(3):123-131.
    [3] R.Agrawal, R.Srikant.Fast Algorithms for Mining Association Rules[C]. Proceedings of the 20th International Conference on Very Large Databases (VLDB'94), Santiago, Chile, 1994:487-499. [4 ]高飞.关联规则挖掘算法研究[D].西安电子科技大学博士研究生学位论文. 2001年10月,p4-5.
    [5]黄进.关联规则数据挖掘算法的研究[D].西南交通大学硕士研究生学位论文. 2003年5月,p15.
    [6] AgrawalR., ImielinskiT., SwamiA. Mining Association Rules between Sets of Items in Large Databases[C]. Proceedings of the 1993 ACM SIGMOD international conference on Management of Data, ACM press New York, NY, USA, Washington, D.C., 1993, pp.207-216.
    [7] Grahne G., Zhu J. Efficiently Using Prefix-trees in Mining Frequent Itemsets[C]. Proceedings of the FIMI’03 Workshop on Frequent Itemsets Mining Implementations, 2003, pp.1-10.
    [8]缪裕青,频繁闭合项目集的并行挖掘算法研究[J].计算机科学31(2004)166-168.
    [9] PeiJ.,HanJ.,H-Mine: HyPer Strueture Mining of Frequent Patterns in Large Databases[C], ICDM 2001, pp. 441-448.
    [10] Valtehev P., Missaoui R., Godin R. Formal concept analysis for knowledge discovery and Data mining: the new challenges, in: P.Eklund (Ed.), ICFCA, Springer-Verlag Berlin Heidelberg, P.Eklund,2004, pp.352-371.
    [11] Ashrafi M.Z., Taniar D., Smith K.. An Efficient Compression Technique for Frequent Itemset Generation in Association Rule Mining[C], PAKDD, Springer-Vela Berlin Heidelberg, 2005, pp.125-135.
    [12] Bahamish H.A.A., Salam R.A., Abdullah R., Osman M.A., RashidN.A. Mining Protein Data Using Parallel/Distributed Association Rules, Proceedings of the International Conference on Information and Communication Technologies: From Theory to Applications[C], IEEE, 2004, pp.461-462.
    [13] PeiJ.,Jiang D., Zhang A. Mining Cross-graph Quasi-cliques in Gene Expression and Protein Interaction Data[C]. Proceedings of the 21st International Conference on Data Engineering (ICDE2005), 2005, pp.353-356.
    [14] Albert A. Okunade and Chutima Suraratdecha. Health care expenditure inertia in the OECD countries: A heterogeneous analysis[J]. Health Care Management Science, 2004, pp.31-42.
    [15] Werner Güth, Friederike Mangle and Axel Ockenfels. An Evolutionary Analysis of Buyer Insurance and Seller Reputation in Online Markets[J]. Theory and Decision, 2007, pp.265-282.
    [16] S.N.Eisenstadt. Social evolution and modernity: Some observations on parson’s comparative and evolutionary analysis: Person’s analysis from the perspective of multiple modernities[J]. The American Sociologist, 2004, pp5-24.
    [17]探讨数据挖掘.应对金融危机—第二届金融数据挖掘研讨会简述[J].统计与信息论坛, 2009年07期(1).
    [18]秦利.广东保监局:保险欺诈呈现逐年上升势头[N].证券时报,2009-7-8(A06).
    [19]苏建东.数据挖掘与数据分析:零售信息化的重要支持—写自第四届零售业CIO高峰论坛[J].信息与电脑,2007年08期,12-14.
    [20]徐振明,曹华娟,舒红平.关联规则数据挖掘技术在制造业中的应用[J].计算机科学,2009年4月(36),211-212,238.
    [21]檀海仁.电信运营企业集团客户的精细化营销[C].中国通信学会第五届学术年会论文集,2008年01期,1794-1798.
    [22]罗可.数据库中数据挖掘理论方法及应用研究[D].湖南大学博士学位论文,2004年10月. 15-16.
    [23] R.Agrawal, T.Imielinski, A.Swami. Mining association rules between sets of items in large databases[C]. In SIGMOD’93, Washington, D.C., pp.207-216, 1993.
    [24] Gaurav Pandey, Michael Steinbach, Rohit GuPta, Tushar Garg, and VIPin Kumar. Association Analysis-based Transformations for Protein Interaction Networks[C]. A Function Prediction Case Study. In SIGKDD’07, San Jose, California, pp.517-530, 2007.
    [25] Yuefeng Li, Wangzhong Yang, et al. Multi-Tier granule mining for representations of multidimention association rules[C]. In Porc. Of the 6th IEEE Interactional Conference on Data Mining(ICDM’06), pp.953-958, 2006.
    [26] Chun-Kit Chui, Ben Kao, Edward Hung. Mining frequent itemsets from uncertain data[C]. PAKDD 2007, LNAI 4426, pp.47-58, 2007.
    [27] Yun Sing Koh, Nathan Rountree, Richard O’Keefe. Mining interesting imperfectly sporadic rules. InProc[C]. Of 10th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining(PAKDD’06), LNCS 3918, pp.473-482, 2006.
    [28] Laure Berti-Equille. Quality-aware association rule mining[C]. In PAKDD 2006, pp.440-449, 2006.
    [29] James Cheng, Yiping Ke, Wilfred Ng. Maintaining Frequent Itemsets over High-Speed Data Streams[C]. In PAKDD 2006, pp.462-467, 2006.
    [30] H..D.K Moonesinghe, et al. Frequent Closed Itemset Mining Using Prefix Graphs with An Efficient Flow-Based Pruning Strategy[C]. In ICDM 2006, pp.426-435, 2006.
    [31] James Cheng,Yiping Ke, Wilfred Ng. Tolerance. Closed Frequent Itemsets[C]. In ICDM 2006, pp.139-148.
    [32]李乃乾,沈钧毅.量化关联规则挖掘及算法[J].小型微型计算机系统. 2003年12月,2275-2277.
    [33]佟强,周园春等.一种量化关联规则挖掘算法[J].计算机工程. 2007年5月,34-35,69.
    [34] R.Srikant, R.Agrawal. Mining generalized association rules[C]. In Proc. of the 2lst VLDB Conf., Zurieh, Switzerland,pp.407-419, 1995.
    [35] J.Han, Y.Fu. Discovery of multiple-level association rules from large databases[C]. IEEE Transactions on Knowledge and Data Engineering, 11(5), pp.420-431, 1999.
    [36] R.J.Miller and Y.Yang. Association rules over interval data[C]. In SIGMOD’97, pp.452-461 Arizona, USA, 1997.
    [37]谭义红,陈治平等.一种改进的约束关联规则挖掘算法[J].计算机工程. 2004年1月,71-72,84.
    [38]杨翠明,李喜苹等.一种基于数据库分解的关联规则挖掘新算法[J].湖南师范大学自然科学学报. 2007年6月,30-34.
    [39] Guoqing Chen, Qiang Wei. Fuzzy association rules and the extended mining algorithrns[J]. Information Sciences, 147, pp.201-228, 2002.
    [40] Peng Yan, Guoqing Che, Chris Comelis, Martine De Cock, Etienne Kerre. Mining positive and negative fuzzy association rules[J]. Lecture Notes in Computer Science 3213, Springer-Vela, pp.270-276, 2004.
    [41] Shu, J.; Tsang, E.; Yeung, D.S.; Shi, D.Mining fuzzy association rules with weighted items[C]. In Proc. of IEEE Int. Conf. on System, Man and Cybemetics, pp.1906-1911, 2000.
    [42]王艳,王红霞.一种新的基于Apriori算法的加权关联规则挖掘算法[J].郑州轻工业学院学报(自然科学版). 2007年6月,175-177.
    [43] ZHANG Peng, TONG Yun-Hai., An Effective Method for Privacy Preserving Association Rule Mining[J]. Journal of Software, Vol. 17, No.8, August 2006, pp.1764-1774.
    [44]张瑞,郑诚.基于隐私保护的关联规则挖掘算法[J].计算机工程. 2009年2月,78-79,82.
    [45]孙宝友,姜合等.一种新的关联规则的增量式更新算法[J].山东轻工业学院学报. 2008年6月,80-83.
    [46]葛丽娜,钟诚.一个有效的分布式并行挖掘关联规则算法[J].计算机工程与设计. 2004年8月,1258-1260.
    [47]郭秀娟,李原.序列模式算法研究——类Apriori方法[J].现代情报. 2003年12月,142-143,146.
    [48]赵奕,施鹏飞.基于频繁集的多层次交互式关联规则挖掘[J].微电子学与计算机. 2000年第3期,54-58.
    [49]孔磊.基于频繁量化约简格的非冗余关联规则发现算法研究[J].计算机应用与软件. 2008年9月,30-32.
    [50] Agrawal.R,Imielinski.T,Swami.A.,Mining Association rules between Sets of Items in large Databases[C]. In Proc.1993 ACM-SIGMOD Int.Conf.Management of Data(SIGMOD’93) 207-216, Washington D.C,July 1993.
    [51] Houtsma.M and Swami.A.Set-Oriented Mining for Association Rules in Relational Databases[C]. Proceedings of the 11th IEEE International Conference on Data Engineering, Taipei, China, 1995:25-34.
    [52] R.Agrawal and R.Srikant. Fast Algorithms for Mining Association Rules in Large Databases,[C]. Proceedings of the Twentieth International Conference on Very Large Databases Santiago, Chile,1994:487-499.
    [53] Lijuan Zhou.Study and Application of Data Mining and Data Warehouse in CIMS[J]. Proceedings of SPIE, V 5098, April, 2003, Orland, Florida USA 2003:21-25.
    [54] Anidnya Datta, Helen Thomas. The Cube Data Model: A Conceptual Model Algebra for On-line Analytical[J]. Processing in data Warehouses Decision Support System. 1997(27): 289-301.
    [55] Lijuan Zhou. The Design and Application of Data Warehouse during Modern Enterprises Environment[J]. The Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2006, V6241:673-677.
    [56]胡孔法,蒋蜂,宋爱波,等.OLAP中聚集函数的更新[C].第十八界全国数据库会议论文集,2001:54-58.