用户名: 密码: 验证码:
Towards Scalable Emotion Classification in Microblog Based on Noisy Training Data
详细信息    查看全文
  • 关键词:Emotion classification ; Data cleaning ; Hashtag ; k ; NN
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2016
  • 出版时间:2016
  • 年:2016
  • 卷:10035
  • 期:1
  • 页码:399-410
  • 全文大小:1,298 KB
  • 参考文献:1.Bandhakavi, A., Nirmalie, W., Deepak, P., Stewart, M.: Generating a word-emotion lexicon from #emotional tweets. In: Proceedings of the Third Joint Conference on Lexical and Computational Semantics, pp. 12–21 (2014)
    2.Blum, A., Tom, M.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92–100. ACM (1998)
    3.Lin, G., Ruifeng X., Qin L.: Cross-lingual opinion analysis via negative transfer detection. In: ACL (2014)
    4.Mehrabian, A.: Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament. Curr. Psychol. 14(4), 261–292 (1996)MathSciNet CrossRef
    5.Min-Ling, Z., Zhi-Hua, Z.: CoTrade: confident co-training with data editing. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 41(6), 1612–1626 (2011)CrossRef
    6.Mohammad, Saif. M.: #Emotional tweets, pp. 246–255. Association for Computational Linguistics (2012)
    7.Mohammad, S.M., Svetlana, K.: Using hashtags to capture fine emotion categories from tweets. Comput. Intell. 31, 301–326 (2014)MathSciNet CrossRef
    8.Wenbo, W., Chen, L., Krishnaprasad, T., Amit, P.S.: Harnessing twitter “big data" for automatic emotion identification. In: Privacy, Security, Risk and Trust (PASSAT), International Conference on and International Confernece on Social Computing, pp. 587–592 (2012)
    9.Wan, X.: Collaborative data cleaning for sentiment classification with noisy training corpus. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part I. LNCS, vol. 6634, pp. 326–337. Springer, Heidelberg (2011)CrossRef
    10.Quan, C., Ren, F.: Construction of a blog emotion corpus for Chinese emotional expression analysis. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 3, pp. 1446–1454. Association for Computational Linguistics, August 2009
    11.Strapparava, C., Mihalcea, R.: Semeval-2007 task 14: affective text. In: Proceedings of the 4th International Workshop on Semantic Evaluations, pp. 70–74. Association for Computational Linguistics, June 2007
    12.Bennett, K., Demiriz, A.: Semi-supervised support vector machines. In: Advances in Neural Information Processing Systems, pp. 368–374 (1999)
    13.Goldberg, A.B., Xiaojin, Z., Singh, A., Xu, Z., Nowak, R.D.: Multi-manifold semi-supervised learning. In: AISTATS, pp. 169–176 (2009)
  • 作者单位:Minglei Li (18)
    Qin Lu (18)
    Lin Gui (19)
    Yunfei Long (18)

    18. Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
    19. Laboratory of Network Oriented Intelligent Computation, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China
  • 丛书名:Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data
  • ISBN:978-3-319-47674-2
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
  • 卷排序:10035
文摘
The availability of labeled corpus is of great importance for emotion classification tasks. Because manual labeling is too time-consuming, hashtags have been used as naturally annotated labels to obtain large amount of labeled training data from microblog. However, the inconsistency and noise in annotation can adversely affect the data quality and thus the performance when used to train a classifier. In this paper, we propose a classification framework which allows naturally annotated data to be used as additional training data and employs a k-NN graph based data cleaning method to remove noise after noisy data has certain accumulations. Evaluation on NLP&CC2013 Chinese Weibo emotion classification dataset shows that our approach achieves 15.8 % better performance than directly using the noisy data without noise filtering. After adding the filtered data with hashtags into an existing high-quality training data, the performance increases 3.7 % compared to using the high-quality training data alone.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700