多约束关联规则的快速入侵检测算法研究

英文题名：Fast Intrusion Detection Technology Research Based on Multi-association Mining Algorithm
作者：杨德璋
论文级别：硕士
学科专业名称：模式识别与智能系统
中文关键词：入侵检测 ; 数据挖掘 ; 关联规则 ; K均值聚类 ; 多规则约束
英文关键词：intrusion detection ; data mining ; association rules ; k-means clustering algorithm ; multi-rules constrain
学位年度：2011
导师：李雷
学科代码：081104
学位授予单位：南京邮电大学
论文提交日期：2011-03-01

摘要

作为对防御入侵行为的系统,入侵检测系统通过对计算机网络或计算机系统中的若干关键点收集信息并对其进行分析,从中发现网络或系统中是否有违反安全策略的行为和被攻击的迹象。数据挖掘是一种利用分析工具在大量数据中提取隐含在其中且潜在有用的信息和知识的过程。因此,入侵检测系统利用数据挖掘技术从大量网络数据中提取尽可能多的隐藏的安全信息,可以达到较好的检测效果。
     本文在研究入侵检测系统中的数据挖掘技术基础上,提出了改进的K均值算法和多规则约束Apriori算法,提高了整个系统的检测性能,有效降低虚警率和误报率。本文主要创新点如下:
     1、提出了一种改进型K均值算法(An Improved K-Means Clustering Algorithm, IMKMCA )。解决了经典的K均值聚类算法对初始聚类中心依赖和迭代次数过多的问题。仿真实验证实了新算法的可行性,以及具有更好的运行效率。
     2、提出了快速多规则约束Apriori算法(A Fast Multi-constrained Apriori Algorithm, FMApriori)。算法利用聚类所得聚类中心,对每个小数据模块进行时间序列约束产生频繁项集,很好解决了Apriori算法中多次扫描事务数据库产生大I/O负载的问题。针对许多强规则事件并不是有趣事件的问题,利用椭圆递减支持度进行约束并对剪枝过程进行了优化,提高了系统检测效率。实验证明,此算法可以有效改善入侵检测系统的运行效率,提高检测效率。
To avoid network intrusion, intrusion detection system (IDS) collects and analyzes information on a number of key points through the computer network or computer system, and finds whether the network or system security policy is a violation of the behavior and signs of attack. Data mining is an analytical tool used to extract large amounts of data in which the implicit and potentially useful information and knowledge. Therefore, the hidden information can be extracted as much as possible with the use of data mining technology, so as to achieve the best detection results.
     In this dissertation, after researching intrusion detection system based on data mining technology, two kinds of algorithms are proposed, which can improve the detection performance of the whole system and reduce the false alarm rate effectively.
     The major innovations of this dissertation are as follows:
     1. An improved k-means clustering algorithm is proposed to solve the limitation of the classical k-means clustering algorithm: over-reliance on the initial cluster centers and the excessive number of iterations. The new algorithm improves the clustering efficiency and is confirmed to be feasibility though simulation experiments.
     2. A fast multi-constrained Apriori Algorithm is proposed after data pre-processing, a large data set could be divided into several small data blocks. Then using the new clustering center of the small data blocks, the new algorithm with the time-series constraint could generate frequent itemsets quickly. It solve the problem that proceeding large I/O load when it scans the transaction databases Considering not every event with strong rules are fun events, the new algorithm using the ellipse function as decreasing support and improve the pruning process, which makes detection rate better. The algorithm is proved that can improve running efficiency and get better detection rate.

引文

[1]蒋建春,马恒太,任党恩等.网络安全入侵检测研究综述[J].软件学报, 2000, 11(11): 1460-1462.
    [2]张红旗,王新昌,杨英杰等.信息安全管理[M].北京:人民邮电出版社, 2007.
    [3]王文胜,王润华,朱甫臣等.信息安全与保密:现代与未来战争的信息卫[M].北京:国防工业出版社, 2008.
    [4] Mark Stamp著,杜瑞颖译. Information Security:Principle and Practice[M].北京:电子工业出版社, 2007.
    [5]蒋建春,冯登国.网络入侵检测原理与技术[M].北京:国防工业出版社,2001.
    [6]戴英侠,连一峰,王航.系统安全与入侵检测[M].北京:清华大学出版社,2002.
    [7]鲜永菊.入侵检测[M].西安:西安电子科技大学出版社, 2009.
    [8] Anderson J.P. Computer Security Threat Monitoring and Surveillance [P]. PA 19034. USA,1980.
    [9] Denning D E. An Intrusion Detection Model [J]. IEEE Transactions on Software Engineering, 1987, 13(2),222-232.
    [10] Lunt T F, Jagannathan R, Gilham F et al. IDES: A Progress Report[C]. In: Proceedings of the 6th Annual Computer Security Applications Conference. 1990, 273-285.
    [11] Heberlein L, Dias G, Levitt K et al. A Network Security Monitor [C]. In: Proceedings of the IEEE Symposium on Research in Security and Privacy. 1990.296-304.
    [12]曹元大,薛静锋,祝烈煌等.入侵检测技术[M].北京:人民邮电出版社, 2007.
    [13]毛国君,段立娟,王实,石云.数据挖掘原理与算法[M].北京:清华大学出版社, 2005.
    [14] Han Jiawei, Kamber M. Data Mining: Concepts and Techniques [M]. Los Altos, CA: Morgan Kaufmann Publishers, 2001.
    [15] Rebecca Gurley Bace著.入侵检测[M].陈明奇等译.北京:人民邮电出版社,2001.
    [16] Lunt T F, Jagannathan R, Lee R et al. IDES: The Enhanced Prototype– a real Time Intrusion Detection Expert System [J]. Computers & Security, 1993, 12(4),405-418.
    [17]杨义先,钮心忻.入侵检测理论与技术[M].北京:高等教育出版社, 2006.
    [18]韩家炜,坎伯.范明,孟小峰译.数据挖掘:概念与技术[M].北京:机械工业出版杜, 2001.
    [19] Margaret H.Dunham著,郭崇慧,田凤占,靳晓明等译.数据挖掘教程[M].北京:清华大学出版社, 2005.
    [20] Lee W, Stolfo S J. Data Mining Approaches for Intrusion Detection[C]. In: Proceedings of the 7th USENDC Security Symposiums, San Antonio: 1998, 6-9.
    [21]王宏渊.数据挖掘在入侵检测系统中的应用研究[D].江西理工大学2008.
    [22] Ian H. Witten, Eibe Frank著,董琳,邱泉,于晓峰,吴韶群,孙立伟译.数据挖掘:实用机器学习技术[M].北京,机械工业出版社, 2005.
    [23]张德干,王晓晔.规则挖掘技术[M].北京:科学出版社, 2008.
    [24] KDD (1999), The Third International Knowledge Discovery and Data Mining Tools Competition Data Set (KDD99 Cup). http://kdd.ics.uci.edu/databases/kddcup99.html.
    [25] Li Mark Junjie, Ng Michael K, Cheung Yiu-ming, et al. Agglomerative Fuzzy K-Means Clustering Algorithm with Selection of Number of Clusters[J]. IEEE Trans. Knowledge and Data Engineering, 2008, 20, 1519-1534.
    [26] Sun Jigui, Liu Jie, Zhao Lianyu, Clustering algorithms Research[C]. Journal of Software, Vol 19,No 1,pp.48-61,January 2008.
    [27] D T Pham, S S Dimov, and C D Nguyen, Selection of K in K-means clustering[C]. Mechanical Engineering Science”, 2004, 219 (C) ,103-119.
    [28] Fang Yuan, Zeng-Hui Meng, Hong-Xia Zhang, Chun-Ru Dong, A new algorithm to get the initial centroids[C]. Proceedings of the Third International Conference on Machine Learning and Cybernetics, Shanghai, 26-29 August 2004.
    [29]彭秋生,魏文红.基于核方法的并行模糊聚类算法[J].计算机工程与设计,2008,29(8), 1881-1883.
    [30] Shi Na, Liu Xumin Guan yong, Research on k-means Clustering Algorithm An Improved k-means Clustering Algorithm[C]. 3rd International Symposium on Intelligent Information Technology and Security Informatics, IITSI 2010, p 63-67, 2010.
    [31] Jian Zhu, Han shi Wang, An improved K-means Clustering Algorithm[C]. ICIME 2010 - 2010 2nd IEEE International Conference on Information Management and Engineering, v 6, p 190-192, 2010.
    [32] R. Agrawal, T. Imielinski, A. Swami. Mining association rules between sets of items in large databases[C]. In: Proceedings of the ACM SIGMOD Conference on Management of dataWashington, USA, 1993:207-216.
    [33]康重庆等.序列运算理论及其应用[M].北京:清华大学出版社, 2003.
    [34]杨善林,倪志伟.机器学习与智能决策支持系统[M].北京:科学出版社, 2004.
    [35]蒋平,李冬静.信息对抗[M].北京:清华大学出版社, 2007.
    [36]李雷,丁亚丽,罗红旗.基于规则约束制导的入侵检测研究[J].计算机技术与发展,2010,3.
    [37] Ding Yali, Li Lei, Luo Hongqi, A Novel Signature Searching for Intrusion Detection System Using Data Mining[C]. Machine Learning and Cybernetics, 2009 International Conference on Volume 1, pp.122 - 126, 2009.
    [38] Hu Zhengbing, Li Zhitang, Wu Junqi. A Novel Network Intrusion Detection (NIDS) Based on Signatures Search of Data Mining[C]. In: 2008 Workshop on Knowledge Discovery and Data Mining, 2008, 10-16.
    [39]朱永宣.基于模式识别的入侵检侧关键技术研究[D].北京邮电大学2006.
    [40] Ding Yuxin, Wang Haisen, Liu Qingwei. Intrusion Scenarios Detection Based on Data Mining[C]. In: Proceedings of the Seventh International Conference on Machine Learning and Cybernetics, Kunming, 12-15 July 2008, 1293-1297.
    [41]李林.基于关联规则的入侵检测技术研究[D].中南民族大学2009.
    [42] Wei Xiaotao, Huang Houkuan, Tian Shengfeng. A semi-supervised clustering algorithm for network intrusion detection[C]. In: Tiedao Xuebao/Journal of the China Railway Society, v 32, n 1, February 2010, p 49-53.
    [43] Wang Haoyu, Ji Xiaojuan, Yun Xue, Liu Xing. Applying Fast-Apriori algorithm to design data mining engine[C]. In: Proceedings - 2010 International Conference on System Science, Engineering Design and Manufacturing Informatization, ICSEM 2010, v 1, 2010, p 63-65.
    [44] Liu Shan, Liao Yongyi. An Improved Apriori Algorithm [J]. Modern Electronic Technique, Vol. 30, No. 4, 2007, 106-110.
    [45] Safaa O.Al-Mamory, Zhang Hongli, Ayad R.Abbas. IDS Alarms Reduction Using Data Mining[C]. In: 2008International Joint Conference on Neural Networks, 2008, 3564-3570.
    [46]李新征.基于关联规则挖掘的数据库入侵检测系统的研究与实现[D].山东大学2008.
    [47] Li Guixiang Gao Weimin. Research on network security system based on intrusion detection[C]. Proceedings of the International Conference on E-Business and E-Government, ICEE 2010, p 2096-2100, 2010