用户名: 密码: 验证码:
Identification of Multi-Focal Questions in Question and Answer Reports
详细信息    查看全文
  • 作者:Mona Mohamed Zaki Ali (18) (19)
    Goran Nenadic (18)
    Babis Theodoulidis (20)
  • 关键词:Question Classification ; Question Analysis ; Content Analysis ; Data Quality ; Text Mining ; Data Mining ; Machine Learning ; Rule ; based Methods
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2014
  • 出版时间:2014
  • 年:2014
  • 卷:8455
  • 期:1
  • 页码:126-137
  • 参考文献:1. Blumberg, R., Atre, S.: The problem with unstructured data. DM Review聽13, 42鈥?9 (2003)
    2. Marshall, G.: The purpose, design and administration of a questionnaire for data collection. Radiography聽11(2), 131鈥?36 (2005) CrossRef
    3. Fadem, T.J.: The art of asking: ask better questions, get better answers. FT Press (2008)
    4. Leung, W.-C.: How to design a questionnaire. BMJ聽9(11), 187鈥?89 (2001)
    5. Huang, P., Bu, J., Chen, C., Qiu, G.: An effective feature-weighting model for question classification. In: Computational Intelligence and Security International Conference, pp. 32鈥?6. IEEE (2007)
    6. Tamura, A., Takamura, H., Okumura, M.: Classification of multiple-sentence questions. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol.聽3651, pp. 426鈥?37. Springer, Heidelberg (2005) CrossRef
    7. Xiao-Ming, L., Li, L.: Question Classification Based on Focus. In: 2012 International Conference Communication Systems and Network Technologies (CSNT), pp. 512鈥?16. IEEE (2012)
    8. Bos, J.: The 鈥淟a Sapienza鈥?Question Answering System at TREC-2006. In: Voorhees, E.M., Buckland, L.P. (eds.) The Fifteenth Text RETrieval Conference, Gaitersburg, MD, pp. 797鈥?03 (2006)
    9. Sahin, A., Kulm, G.: Sixth grade mathematics teachers鈥?intentions and use of probing, guiding, and factual questions. Journal of Mathematics Teacher Education聽11(3), 221鈥?41 (2008) CrossRef
    10. Hagstrom, P.A.: Decomposing questions. PhD dissertation, Massachusetts Institute of Technology (1998)
    11. Isaacs, J., Rawlins, K.: Conditional questions. Journal of Semantics聽25(3), 269鈥?19 (2008) CrossRef
    12. Rubin, A., Babbie, E.R.: Research methods for social work. Cengage Learning (2008)
    13. Voorhees, E.M.: Overview of the TREC 2001 question answering track. In: NIST Special Publication, pp. 42鈥?1 (2002)
    14. Sehgal, A.K., Das, S., Noto, K., Saier, M.K., Elkan, C.: Identifying relevant data for a biological database: Handcrafted rules versus machine learning. IEEE/ACM Transactions Computational Biology and Bioinformatics聽8(3), 851鈥?57 (2011) CrossRef
    15. Zhang, D., Lee, W.S.: Question classification using support vector machines. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 26鈥?2. ACM (2003)
    16. Loni, B., van Tulder, G., Wiggers, P., Tax, D.M.J., Loog, M.: Question classification by weighted combination of lexical, syntactic and semantic features. In: Habernal, I., Matou拧ek, V. (eds.) TSD 2011. LNCS (LNAI), vol.聽6836, pp. 243鈥?50. Springer, Heidelberg (2011) CrossRef
    17. Metzler, D., Croft, W.B.: Analysis of statistical question classification for fact-based questions. Information Retrieval 8聽3, 481鈥?04 (2005) CrossRef
    18. Carbon Disclosure Project, project.net" class="a-plus-plus"> https://www.cdproject.net
    19. Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Computational Linguistics聽34(4), 555鈥?96 (2008) CrossRef
    20. Murray, P.: Fundamental issues in questionnaire design. Accident and Emergency Nursing聽7(3), 148鈥?53 (1999) CrossRef
    21. TreeTagger - a language independent part-of-speech tagger, http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/
    22. Flesch, R.: A new readability yardstick. Journal of Applied Psychology聽32, 221 (1948) CrossRef
    23. Kincaid, J.P., Fishburne Jr., R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Naval Technical Training Command Millington TN Research Branch (1975)
    24. Flesch Reading Ease Readability Score, http://rfptemplates.technologyevaluation.com/readability-scores/flesch-reading-ease-readability-score.html
    25. Flesch, R.F.: How to test readability. Harper (1951)
    26. IBM SPSS Modeler for data and text mining, http://www.01.ibm.com/software/analytics-/spss-/products/modeler/
    27. IBM SPSS Modeler Text Analytics, ftp://public.dhe.ibm.com/software/analytics/spss/doc-umentation/modeler/15.0/en/Users_Guide_For_Text_Analytics.pdf
    28. Nenadi茅, G., Ananiadou, S., McNaught, J.: Enhancing automatic term recognition through recognition of variation. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 604. ACL (2004)
    29. Bishop, C.M., Nasrabadi, N.M.: Pattern recognition and machine learning, vol.聽1. Springer, New York (2006)
    30. Kantardzic, M.: Data mining: concepts, models, methods, and algorithms. John Wiley & Sons (2011)
    31. Li, D.-C., Fang, Y.-H., Fang, Y.M.: The data complexity index to construct an efficient cross-validation method. Decision Support Systems聽50(1), 93鈥?02 (2010) CrossRef
  • 作者单位:Mona Mohamed Zaki Ali (18) (19)
    Goran Nenadic (18)
    Babis Theodoulidis (20)

    18. School of Computer Science, The University of Manchester, Manchester, UK
    19. Faculty of Computers and Informatics, Suez Canal University, Ismailia, Egypt
    20. Manchester Business School, The University of Manchester, Manchester, UK
  • ISSN:1611-3349
文摘
A significant amount of business and scientific data is collected via question and answer reports. However, these reports often suffer from various data quality issues. In many cases, questionnaires contain a number of questions that require multiple answers, which we argue can be a potential source of problems that may lead to poor-quality answers. This paper introduces multi-focal questions and proposes a model for identifying them. The model consists of three phases: question pre-processing, feature engineering and question classification. We use six types of features: lexical/surface features, Part-of-Speech, readability, question structure, wording and placement features, question response type and format features and question focus. A comparative study of three different machine learning algorithms (Bayes Net, Decision Tree and Support Vector Machine) is performed on a dataset of 150 questions obtained from the Carbon Disclosure Project, achieving the accuracy of 91%.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700