用户名: 密码: 验证码:
Leveraging linguistic traits and semi-supervised learning to single out informational content across how-to community question-answering archives
详细信息    查看全文
文摘
Community Question-Answering sites (e.g., Yahoo! Answers) have become large-scale knowledge bases of natural language questions formulated by their own members. In other to provide quick answers, these sites are compelled to make the best out of the content stored in their repository. Researchers have discovered that, on the one hand, many of these services are the confluence of an information-seeking and a social network that are constantly overlapping, and on the other hand, how-to questions are frequently published across these platforms. By and large, informational procedural questions are highly likely to expect informational answers, while non-informational manner questions target at socially interacting with other members of the community. In order to enhance user experience by reducing the delay in answering, these services are heartened to identify, retrieve and revitalize the content maintained in their knowledge bases. For this purpose, it is key to match the intent of new posted questions with the intention of archived answers that will be presented to the asker.By manually annotating a reduced number of how-to questions and answers, we carried out an exploratory analysis that unveils a dichotomy between the interaction of these two networks. More precisely, we corroborate previous findings indicating that procedural questions are more likely to bear an informational goal, but our analysis is also extended to their answers, and it reveals that they exhibit a more conspicuous confluence. In substance, we find out that informational and non-informational answers are very likely to show up regardless of the end of the question. Then, we take advantage of this tagged set and of massive unlabelled material for exploiting two state-of-the-art single-view semi-supervised approaches aimed at discriminating informational from non-informational how-to content.Moreover, our proposed models leverage assorted linguistically-motivated features, such as sentiment analysis and dependency parsing as well as named entity recognition. Our outcomes show that attributes, harvested from morphological and sentiment analysis, proven to be effective under a semi-supervised framework. At the expenses of low annotation costs, these linguistically-motivated semi-supervised models reached an accuracy of 84.25% and 74.41% for classifying questions and answers, respectively. In addition, we quantify the impact of automatically detecting informational/non-informational intents on the retrieval of best answers, i.e., an improvement of 4.12% in terms of precision at one.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700