用户名: 密码: 验证码:
Building High-Performance Classifiers Using Positive and Unlabeled Examples for Text Classification
详细信息    查看全文
  • 作者:Ting Ke (19)
    Bing Yang (19)
    Ling Zhen (19)
    Junyan Tan (19)
    Yi Li (20)
    Ling Jing (19)
  • 关键词:text classification ; PU learning ; SVM
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2012
  • 出版时间:2012
  • 年:2012
  • 卷:7368
  • 期:1
  • 页码:196-204
  • 全文大小:218KB
  • 参考文献:1. Cortes, C., Vapnik, V.: Support vector network. J. Mach. Learn.聽20, 273鈥?97 (1995)
    2. Fung, G.P.C., Yu, J.X., Lu, H., Yu, P.S.: Text Classification without Negative Examples Revisit. IEEE Transactions on Knowledge and Data Engineering聽18(1), 6鈥?0 (2006) CrossRef
    3. Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: N茅dellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol.聽1398, pp. 137鈥?42. Springer, Heidelberg (1998) CrossRef
    4. Manevitz, L., Yousef, M.: One-class SVMs for document classification. J. Mach. Learn. Res.聽2, 139鈥?54 (2001)
    5. Lang, K.: Newsweeder: Learning to filter netnews. In: Proceedings of the 12th International Machine Learning Conference, Lake Tahoe, US, pp. 331鈥?39 (1995)
    6. Lee, W.S., Liu, B.: Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression. In: Proceedings of the 20th International Conference on Machine Learning, Washington, DC, United States, pp. 448鈥?55 (2003)
    7. Li, X., Liu, B.: Learning to Classify Text Using Positive and Unlabeled Data. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence, Acapulco, Mexico, pp. 587鈥?94 (2003)
    8. Li, X.-L., Liu, B., Ng, S.-K.: Learning to Classify Documents with Only a Small Positive Training Set. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladeni膷, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol.聽4701, pp. 201鈥?13. Springer, Heidelberg (2007) CrossRef
    9. Li, X., Liu, B., Ng, S.: Negative Training Data can be Harmful to Text Classification. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Massachusetts, USA, pp. 218鈥?28 (2010)
    10. Liu, B., Lee, W.S., Yu, P.S., Li, X.: Partially Supervised Classification of Text Documents. In: Proceedings of the 19th International Conference on Machine Learning, Sydney, Australia, pp. 387鈥?94 (2002)
    11. Liu, B., Dai, Y., Li, X., Lee, W.S., Yu, P.S.: Building Text Classifiers Using Positive and Unlabeled Examples. In: Proceedings of the 3rd IEEE International Conference on Data Mining, Melbourne, Florida, United States, pp. 179鈥?88 (2003)
    12. Nigam, K., McCallum, A.K., Thrun, S.: Learning to Classify Text from Labeled and Unlabeled Documents. In: Proceedings of the 15th National Conference on Artificial Intelligence, pp. 792鈥?99. AAAI Press, United States (1998)
    13. Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text Classification from Labeled and Unlabeled Documents Using EM. Mach. Learn.聽39, 103鈥?34 (2000) CrossRef
    14. Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computer Surveys聽34, 1鈥?7 (2002) CrossRef
    15. Yu, H., Han, J., Chang, K.C.C.: PEBL: Positive Example-Based learning for web page classification using SVM. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 239鈥?48. ACM, United States (2002) CrossRef
  • 作者单位:Ting Ke (19)
    Bing Yang (19)
    Ling Zhen (19)
    Junyan Tan (19)
    Yi Li (20)
    Ling Jing (19)

    19. Department of Applied Mathematics, College of Science, China Agricultural University, 100083, Beijing, P.R. China
    20. Department of Mathematics, School of Science, Beijing University of Posts and Telecommunications, 100876, Beijing, P.R. China
文摘
This paper studies the problem of building text classifiers using only positive and unlabeled examples. At present, many techniques for solving this problem were proposed, such as Biased-SVM which is the existing popular method and its classification performance is better than most of two-step techniques. In this paper, an improved iterative classification approach is proposed which is the extension of Biased-SVM. The first iteration of our developed approach is Biased-SVM and the next iterations are to identify confident positive examples from the unlabeled examples. Then an extra penalty factor is given to weight these confident positive examples error. Experiments show that it is effective for text classification and outperforms the Biased-SVM and other two step techniques.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700