用户名: 密码: 验证码:
Enhancement of spam detection mechanism based on hybrid \(\varvec{k}\) -mean clustering and support vector machine
详细信息    查看全文
  • 作者:Nadir Omer Fadl Elssied ; Othman Ibrahim ; Ahmed Hamza Osman
  • 关键词:$$k$$ k ; mean clustering ; Mechanism ; Non ; spam ; Spam detection ; SVM ; Spam ; $$t$$ t test ; Coefficient correlation
  • 刊名:Soft Computing - A Fusion of Foundations, Methodologies and Applications
  • 出版年:2015
  • 出版时间:November 2015
  • 年:2015
  • 卷:19
  • 期:11
  • 页码:3237-3248
  • 全文大小:1,592 KB
  • 参考文献:Alguliev RM, Aliguliyev RM, Nazirova SA (2011) Classification of textual e-mail spam using data mining techniques. Appl Comput Intell Soft Comput 2011:1鈥? Art. ID 416308
    Alguliyev R, Nazirova S (2012) Two approaches on implementation of CBR and CRM technologies to the spam filtering problem. Inf J
    Castiglione A et al (2012) An asynchronous covert channel using spam. Comput Math Appl 63(2):437鈥?47CrossRef
    Chhabra P, Wadhvani R, Shukla S (2010) Spam filtering using support vector machine. In: ACCTA-2010, pp 166鈥?71
    DeBarr D, Wechsler H (2009) Spam detection using clustering, random forests, and active learning. In: CEAS 2009, California, USA
    Drucker H, Wu D, Vapnik VN (1999) Support vector machines for spam categorization. Neural Netw IEEE Trans 10(5):1048鈥?054CrossRef
    Golovko V et al (2010) Neural network and artificial immune systems for malware and network intrusion detection. In: Proccedings of advances in machine learning II, pp 485鈥?13
    Guzella TS, Caminhas WM (2009) A review of machine learning approaches to spam filtering. Expert Syst Appl 36(7):10206鈥?0222CrossRef
    Hayati P, Potdar V (2008) Evaluation of spam detection and prevention frameworks for email and image spam: a state of art. In: Proceedings of ACM
    Hopkins M et al (1999) Spambase dataset. https://鈥媋rchive.鈥媔cs.鈥媢sci.鈥媏du/鈥媘l/鈥媎atasets/鈥媠pambase
    Idris I (2011) E-mail spam classification with artificial neural network and negative selection algorithm. Int J Comput Sci 1(3):227鈥?31
    Idris I (2012a) Model and algorithm in artificial immune system for spam detection. Int J 3(1):83鈥?4
    Idris I (2012b) Optimized spam classification approach with negative selection algorithm. J Theor Appl Inf Technol 39(1):22鈥?1
    Jin Q, Ming M (2011) A method to construct self set for IDS based on negative selection algorithm. In: Proceedings of IEEE
    Lai CC, Wu CH (2007) Particle swarm optimization-aided feature selection for spam email classification. In: Proceedings of IEEE
    Lee SM et al (2010) Spam detection using feature selection and parameters optimization. In: Proceedings of IEEE
    Long X, Cleveland WL, Yao YL (2011) Methods and systems for identifying and localizing objects based on features of the objects that are mapped to a vector, Google patents
    MacQueen J (1967) Some methods for classification and analysis of multivariate observations. California, USA
    Marsono MN (2007) Towards improving e-mail content classification for spam control: architecture, abstraction, and strategies. PhD Thesis, University of Victoria
    Ma W, Tran D, Sharma D (2009) A novel spam email detection system based on negative selection. In: Proceedings of IEEE
    Mazid MM, Ali ABMS, Tickle KS (2010) Improved C4.5 algorithm for rule based classification recent advances in artificial intelligence, knowledge engineering and data bases
    Mohammad AH, Zitar RA (2011) Application of genetic optimized artificial immune system and neural networks in spam detection. Appl Soft Comput 11(4):3827鈥?845CrossRef
    Morariu DI, Vintan LN, Tresp V (2006) Evolutionary feature selection for text documents using the SVM. Trans Eng Comput Tech 15:215鈥?21
    M眉nz G, Li S, Carle G (2007) Traffic anomaly detection using k-means clustering
    Naksomboon S, Charnsripinyo C, Wattanapongsakorn N (2010) Considering behavior of sender in spam mail detection. In: Proceedings of 6th international conference on networked computing (INC)
    Noble WS (2006) What is a support vector machine? Nature Biotechnol 24(12):1565鈥?567MathSciNet CrossRef
    Nosrati L, Pour AN (2011) DWM-CDD: dynamic weighted majority concept drift detection for spam mail filtering world academy of science. Eng Technol 80:2011
    Palmieri F et al (2013) On the detection of card-sharing traffic through wavelet analysis and support vector machines. Appl Soft Comput 13(1):615鈥?27CrossRef
    Palmieri F, Fiore U, Castiglione A (2014) A distributed approach to network anomaly detection based on independent component analysis. Concurr Comput Pract Exp 26(5):1113鈥?129
    Pearson K (1920) Notes on the history of correlation. Biometrika 13(1):25鈥?5CrossRef
    Radicati S, Khmartseva M (2009) Email statistics report, 2009鈥?013 May. Radicati Group. www.鈥媟adicati.鈥媍om/鈥媤p/鈥媤p-content/鈥媢ploads/鈥?009/鈥?5/鈥媏mail-stats-report-exec-summary.鈥媝df . Accessed 5 Mar 2010)
    Rao IKR (2003) Data mining and clustering techniques
    Raskar SS, Thakore D (2011) Text mining and clustering analysis. IJCSNS 11(6):203
    Saad O, Darwish A, Faraj R (2012) A survey of machine learning techniques for Spam filtering. IJCSNS 12(2):66
    Salcedo-Campos F, D铆az-Verdejo J, Garc铆a-Teodoro P (2012) Segmental parameterisation and statistical modelling of e-mail headers for spam detection. Inf Sci 195:45鈥?1CrossRef
    Salehi S, Selamat A (2011) Hybrid simple artificial immune system (SAIS) and particle swarm optimization (PSO) for spam detection. In: Proceedings of IEEE
    Sun J et al (2010) Analysis of the distance between two classes for tuning SVM hyperparameters. Neural Netw IEEE Trans 21(2):305鈥?18CrossRef
    Tafazzoli T, Sadjadi SH (2009) A combined method for detecting spam machines on a target network. Int J Comput Netw Commun (IJCNC) 1(2):35鈥?4
    Temitayo F, Stephen O, Abimbola A (2012) Hybrid GA-SVM for efficient feature selection in e-mail classification. Comput Eng Intell Syst 3(3):17鈥?8
    Torres GJ, Basnet RB, Sung AH, Mukkamala S, Ribero BM (2009) A similarity measure for clustering and its applications. Int J Electr Comput Syst Eng 3(3):164鈥?70
    Vinther M (2002) Intelligent junk mail detection using neural networks. http://鈥媤ww.鈥媗ogicnet.鈥媎k/鈥媟eports/鈥婮unkDetection/鈥婮unkDetection.鈥媝df
    Wang L (2005) Support vector machines: theory and applications. vol. 177, pp 1鈥?7. Springer, Auckland, New Zealand
    Wang X, Cloete I (2005) Learning to classify email: a survey. In: Proceedings of IEEE
    Wu X et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1鈥?7CrossRef
    Wu CH (2009) Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks. Expert Syst Appl 36(3):4321鈥?330CrossRef
    Xie Y et al (2008) Spamming botnets: signatures and characteristics. In: Proceedings of ACM
    Youn S, McLeod D (2007) A comparative study for email classification. Computing Sciences and Software Engineering, Advances and Innovations in Systems, pp 387鈥?91
    Yu B, Xu Z (2008) A comparative study for content-based dynamic spam classification using four machine learning algorithms. Knowl Based Syst 21(4):355鈥?62CrossRef
    Zhang Q et al (2011) Fuzzy clustering based on semantic body and its application in Chinese spam filtering. JDCTA: Int J Digital Content Technol Appl 5(4):1鈥?1
  • 作者单位:Nadir Omer Fadl Elssied (1)
    Othman Ibrahim (1)
    Ahmed Hamza Osman (1)

    1. Universiti Teknologi Malaysia (UTM), Johor Bahru, Malaysia
  • 刊物类别:Engineering
  • 刊物主题:Numerical and Computational Methods in Engineering
    Theory of Computation
    Computing Methodologies
    Mathematical Logic and Foundations
    Control Engineering
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1433-7479
文摘
Spam e-mails are considered a serious violation of privacy. It has become costly and unwanted communication. Support vector machine (SVM) has been widely used in e-mail spam classification, yet the problem of dealing with huge amounts of data results in low accuracy and time consumption as many researches have demonstrated. This paper proposes a hybrid approach for e-mail spam classification based on the SVM and \(k\)-mean clustering. The experiment of the proposed approach was carried out using spambase standard dataset to evaluate the feasibility of the proposed method. The result of this combination led to improve SVM and accordingly increase the accuracy of spam classification. The accuracy based on SVM algorithm is 96.30 % and the proposed hybrid SVM algorithm with \(k\)-mean clustering is 98.01 %. In addition, experimental results on spambase datasets showed that improved SVM (ESVM) significantly outperforms SVM and many other recent spam classification methods. Keywords \(k\)-mean clustering Mechanism Non-spam Spam detection SVM Spam \(t\) test Coefficient correlation

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700