Enhancement of spam detection mechanism based on hybrid $\varvec{k}$ -mean clustering and support vector machine

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

Enhancement of spam detection mechanism based on hybrid $\varvec{k}$ -mean clustering and support vector machine

详细信息查看全文

作者：Nadir Omer Fadl Elssied ; Othman Ibrahim ; Ahmed Hamza Osman
关键词：$$k$$ k ; mean clustering ; Mechanism ; Non ; spam ; Spam detection ; SVM ; Spam ; $$t$$ t test ; Coefficient correlation
刊名：Soft Computing - A Fusion of Foundations, Methodologies and Applications
出版年：2015
出版时间：November 2015
年：2015
卷：19
期：11
页码：3237-3248
全文大小：1,592 KB
参考文献：Alguliev RM, Aliguliyev RM, Nazirova SA (2011) Classification of textual e-mail spam using data mining techniques. Appl Comput Intell Soft Comput 2011:1鈥? Art. ID 416308
Alguliyev R, Nazirova S (2012) Two approaches on implementation of CBR and CRM technologies to the spam filtering problem. Inf J
Castiglione A et al (2012) An asynchronous covert channel using spam. Comput Math Appl 63(2):437鈥?47CrossRef
Chhabra P, Wadhvani R, Shukla S (2010) Spam filtering using support vector machine. In: ACCTA-2010, pp 166鈥?71
DeBarr D, Wechsler H (2009) Spam detection using clustering, random forests, and active learning. In: CEAS 2009, California, USA
Drucker H, Wu D, Vapnik VN (1999) Support vector machines for spam categorization. Neural Netw IEEE Trans 10(5):1048鈥?054CrossRef
Golovko V et al (2010) Neural network and artificial immune systems for malware and network intrusion detection. In: Proccedings of advances in machine learning II, pp 485鈥?13
Guzella TS, Caminhas WM (2009) A review of machine learning approaches to spam filtering. Expert Syst Appl 36(7):10206鈥?0222CrossRef
Hayati P, Potdar V (2008) Evaluation of spam detection and prevention frameworks for email and image spam: a state of art. In: Proceedings of ACM
Hopkins M et al (1999) Spambase dataset. https://鈥媋rchive.鈥媔cs.鈥媢sci.鈥媏du/鈥媘l/鈥媎atasets/鈥媠pambase
Idris I (2011) E-mail spam classification with artificial neural network and negative selection algorithm. Int J Comput Sci 1(3):227鈥?31
Idris I (2012a) Model and algorithm in artificial immune system for spam detection. Int J 3(1):83鈥?4
Idris I (2012b) Optimized spam classification approach with negative selection algorithm. J Theor Appl Inf Technol 39(1):22鈥?1
Jin Q, Ming M (2011) A method to construct self set for IDS based on negative selection algorithm. In: Proceedings of IEEE
Lai CC, Wu CH (2007) Particle swarm optimization-aided feature selection for spam email classification. In: Proceedings of IEEE
Lee SM et al (2010) Spam detection using feature selection and parameters optimization. In: Proceedings of IEEE
Long X, Cleveland WL, Yao YL (2011) Methods and systems for identifying and localizing objects based on features of the objects that are mapped to a vector, Google patents
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. California, USA
Marsono MN (2007) Towards improving e-mail content classification for spam control: architecture, abstraction, and strategies. PhD Thesis, University of Victoria
Ma W, Tran D, Sharma D (2009) A novel spam email detection system based on negative selection. In: Proceedings of IEEE
Mazid MM, Ali ABMS, Tickle KS (2010) Improved C4.5 algorithm for rule based classification recent advances in artificial intelligence, knowledge engineering and data bases
Mohammad AH, Zitar RA (2011) Application of genetic optimized artificial immune system and neural networks in spam detection. Appl Soft Comput 11(4):3827鈥?845CrossRef
Morariu DI, Vintan LN, Tresp V (2006) Evolutionary feature selection for text documents using the SVM. Trans Eng Comput Tech 15:215鈥?21
M眉nz G, Li S, Carle G (2007) Traffic anomaly detection using k-means clustering
Naksomboon S, Charnsripinyo C, Wattanapongsakorn N (2010) Considering behavior of sender in spam mail detection. In: Proceedings of 6th international conference on networked computing (INC)
Noble WS (2006) What is a support vector machine? Nature Biotechnol 24(12):1565鈥?567MathSciNet CrossRef
Nosrati L, Pour AN (2011) DWM-CDD: dynamic weighted majority concept drift detection for spam mail filtering world academy of science. Eng Technol 80:2011
Palmieri F et al (2013) On the detection of card-sharing traffic through wavelet analysis and support vector machines. Appl Soft Comput 13(1):615鈥?27CrossRef
Palmieri F, Fiore U, Castiglione A (2014) A distributed approach to network anomaly detection based on independent component analysis. Concurr Comput Pract Exp 26(5):1113鈥?129
Pearson K (1920) Notes on the history of correlation. Biometrika 13(1):25鈥?5CrossRef
Radicati S, Khmartseva M (2009) Email statistics report, 2009鈥?013 May. Radicati Group. www.鈥媟adicati.鈥媍om/鈥媤p/鈥媤p-content/鈥媢ploads/鈥?009/鈥?5/鈥媏mail-stats-report-exec-summary.鈥媝df . Accessed 5 Mar 2010)
Rao IKR (2003) Data mining and clustering techniques
Raskar SS, Thakore D (2011) Text mining and clustering analysis. IJCSNS 11(6):203
Saad O, Darwish A, Faraj R (2012) A survey of machine learning techniques for Spam filtering. IJCSNS 12(2):66
Salcedo-Campos F, D铆az-Verdejo J, Garc铆a-Teodoro P (2012) Segmental parameterisation and statistical modelling of e-mail headers for spam detection. Inf Sci 195:45鈥?1CrossRef
Salehi S, Selamat A (2011) Hybrid simple artificial immune system (SAIS) and particle swarm optimization (PSO) for spam detection. In: Proceedings of IEEE
Sun J et al (2010) Analysis of the distance between two classes for tuning SVM hyperparameters. Neural Netw IEEE Trans 21(2):305鈥?18CrossRef
Tafazzoli T, Sadjadi SH (2009) A combined method for detecting spam machines on a target network. Int J Comput Netw Commun (IJCNC) 1(2):35鈥?4
Temitayo F, Stephen O, Abimbola A (2012) Hybrid GA-SVM for efficient feature selection in e-mail classification. Comput Eng Intell Syst 3(3):17鈥?8
Torres GJ, Basnet RB, Sung AH, Mukkamala S, Ribero BM (2009) A similarity measure for clustering and its applications. Int J Electr Comput Syst Eng 3(3):164鈥?70
Vinther M (2002) Intelligent junk mail detection using neural networks. http://鈥媤ww.鈥媗ogicnet.鈥媎k/鈥媟eports/鈥婮unkDetection/鈥婮unkDetection.鈥媝df
Wang L (2005) Support vector machines: theory and applications. vol. 177, pp 1鈥?7. Springer, Auckland, New Zealand
Wang X, Cloete I (2005) Learning to classify email: a survey. In: Proceedings of IEEE
Wu X et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1鈥?7CrossRef
Wu CH (2009) Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks. Expert Syst Appl 36(3):4321鈥?330CrossRef
Xie Y et al (2008) Spamming botnets: signatures and characteristics. In: Proceedings of ACM
Youn S, McLeod D (2007) A comparative study for email classification. Computing Sciences and Software Engineering, Advances and Innovations in Systems, pp 387鈥?91
Yu B, Xu Z (2008) A comparative study for content-based dynamic spam classification using four machine learning algorithms. Knowl Based Syst 21(4):355鈥?62CrossRef
Zhang Q et al (2011) Fuzzy clustering based on semantic body and its application in Chinese spam filtering. JDCTA: Int J Digital Content Technol Appl 5(4):1鈥?1
作者单位：Nadir Omer Fadl Elssied (1)
Othman Ibrahim (1)
Ahmed Hamza Osman (1)

1. Universiti Teknologi Malaysia (UTM), Johor Bahru, Malaysia
刊物类别：Engineering
刊物主题：Numerical and Computational Methods in Engineering
Theory of Computation
Computing Methodologies
Mathematical Logic and Foundations
Control Engineering
出版者：Springer Berlin / Heidelberg
ISSN：1433-7479

文摘

Spam e-mails are considered a serious violation of privacy. It has become costly and unwanted communication. Support vector machine (SVM) has been widely used in e-mail spam classification, yet the problem of dealing with huge amounts of data results in low accuracy and time consumption as many researches have demonstrated. This paper proposes a hybrid approach for e-mail spam classification based on the SVM and $k$-mean clustering. The experiment of the proposed approach was carried out using spambase standard dataset to evaluate the feasibility of the proposed method. The result of this combination led to improve SVM and accordingly increase the accuracy of spam classification. The accuracy based on SVM algorithm is 96.30 % and the proposed hybrid SVM algorithm with $k$-mean clustering is 98.01 %. In addition, experimental results on spambase datasets showed that improved SVM (ESVM) significantly outperforms SVM and many other recent spam classification methods. Keywords $k$-mean clustering Mechanism Non-spam Spam detection SVM Spam $t$ test Coefficient correlation

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700