Towards Scalable Emotion Classification in Microblog Based on Noisy Training Data

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

Towards Scalable Emotion Classification in Microblog Based on Noisy Training Data

详细信息查看全文

关键词：Emotion classification ; Data cleaning ; Hashtag ; k ; NN
刊名：Lecture Notes in Computer Science
出版年：2016
出版时间：2016
年：2016
卷：10035
期：1
页码：399-410
全文大小：1,298 KB
参考文献：1.Bandhakavi, A., Nirmalie, W., Deepak, P., Stewart, M.: Generating a word-emotion lexicon from #emotional tweets. In: Proceedings of the Third Joint Conference on Lexical and Computational Semantics, pp. 12–21 (2014)
2.Blum, A., Tom, M.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92–100. ACM (1998)
3.Lin, G., Ruifeng X., Qin L.: Cross-lingual opinion analysis via negative transfer detection. In: ACL (2014)
4.Mehrabian, A.: Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament. Curr. Psychol. 14(4), 261–292 (1996)MathSciNet CrossRef
5.Min-Ling, Z., Zhi-Hua, Z.: CoTrade: confident co-training with data editing. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 41(6), 1612–1626 (2011)CrossRef
6.Mohammad, Saif. M.: #Emotional tweets, pp. 246–255. Association for Computational Linguistics (2012)
7.Mohammad, S.M., Svetlana, K.: Using hashtags to capture fine emotion categories from tweets. Comput. Intell. 31, 301–326 (2014)MathSciNet CrossRef
8.Wenbo, W., Chen, L., Krishnaprasad, T., Amit, P.S.: Harnessing twitter “big data" for automatic emotion identification. In: Privacy, Security, Risk and Trust (PASSAT), International Conference on and International Confernece on Social Computing, pp. 587–592 (2012)
9.Wan, X.: Collaborative data cleaning for sentiment classification with noisy training corpus. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part I. LNCS, vol. 6634, pp. 326–337. Springer, Heidelberg (2011)CrossRef
10.Quan, C., Ren, F.: Construction of a blog emotion corpus for Chinese emotional expression analysis. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 3, pp. 1446–1454. Association for Computational Linguistics, August 2009
11.Strapparava, C., Mihalcea, R.: Semeval-2007 task 14: affective text. In: Proceedings of the 4th International Workshop on Semantic Evaluations, pp. 70–74. Association for Computational Linguistics, June 2007
12.Bennett, K., Demiriz, A.: Semi-supervised support vector machines. In: Advances in Neural Information Processing Systems, pp. 368–374 (1999)
13.Goldberg, A.B., Xiaojin, Z., Singh, A., Xu, Z., Nowak, R.D.: Multi-manifold semi-supervised learning. In: AISTATS, pp. 169–176 (2009)
作者单位：Minglei Li (18)
Qin Lu (18)
Lin Gui (19)
Yunfei Long (18)

18. Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
19. Laboratory of Network Oriented Intelligent Computation, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China
丛书名：Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data
ISBN：978-3-319-47674-2
刊物类别：Computer Science
刊物主题：Artificial Intelligence and Robotics
Computer Communication Networks
Software Engineering
Data Encryption
Database Management
Computation by Abstract Devices
Algorithm Analysis and Problem Complexity
出版者：Springer Berlin / Heidelberg
ISSN：1611-3349
卷排序：10035

文摘

The availability of labeled corpus is of great importance for emotion classification tasks. Because manual labeling is too time-consuming, hashtags have been used as naturally annotated labels to obtain large amount of labeled training data from microblog. However, the inconsistency and noise in annotation can adversely affect the data quality and thus the performance when used to train a classifier. In this paper, we propose a classification framework which allows naturally annotated data to be used as additional training data and employs a k-NN graph based data cleaning method to remove noise after noisy data has certain accumulations. Evaluation on NLP&CC2013 Chinese Weibo emotion classification dataset shows that our approach achieves 15.8 % better performance than directly using the noisy data without noise filtering. After adding the filtered data with hashtags into an existing high-quality training data, the performance increases 3.7 % compared to using the high-quality training data alone.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700