文摘
Spam e-mails are considered a serious violation of privacy. It has become costly and unwanted communication. Support vector machine (SVM) has been widely used in e-mail spam classification, yet the problem of dealing with huge amounts of data results in low accuracy and time consumption as many researches have demonstrated. This paper proposes a hybrid approach for e-mail spam classification based on the SVM and \(k\)-mean clustering. The experiment of the proposed approach was carried out using spambase standard dataset to evaluate the feasibility of the proposed method. The result of this combination led to improve SVM and accordingly increase the accuracy of spam classification. The accuracy based on SVM algorithm is 96.30 % and the proposed hybrid SVM algorithm with \(k\)-mean clustering is 98.01 %. In addition, experimental results on spambase datasets showed that improved SVM (ESVM) significantly outperforms SVM and many other recent spam classification methods. Keywords \(k\)-mean clustering Mechanism Non-spam Spam detection SVM Spam \(t\) test Coefficient correlation