Near-synonym substitution using a discriminative vector space model

详细信息

作者：Liang-Chih Yua ; b ; lcyu@saturn.yzu.edu.tw" class="auth_mail" title="E-mail the corresponding authorAuthor Vitae ; Lung-Hao Leec ; lhlee@ntnu.edu.tw" class="auth_mail" title="E-mail the corresponding authorAuthor Vitae ; Jui-Feng Yehd ; ralph@mail.ncyu.edu.tw" class="auth_mail" title="E-mail the corresponding authorAuthor Vitae ; Hsiu-Min Shihe ; culy33@yahoo.com.tw" class="auth_mail" title="E-mail the corresponding authorAuthor Vitae ; Yu-Ling Laie ; yllai@math.ccu.edu.tw" class="auth_mail" title="E-mail the corresponding authorAuthor Vitae
关键词：Natural language processing ; Lexical substitution ; Near-synonym learning ; Discriminative training ; Vector space model
刊名：Knowledge-Based Systems
年：2016
期：Complete
DOI：10.1016/j.knosys.2016.05.025
来源：Elsevier
类型：期刊

摘要

Near-synonyms are fundamental and useful knowledge resources for computer-assisted language learning (CALL) applications. For example, in online language learning systems, learners may have a need to express a similar meaning using different words. However, it is usually difficult to choose suitable near-synonyms to fit a given context because the differences of near-synonyms are not easily grasped in practical use, especially for second language (L2) learners. Accordingly, it is worth developing algorithms to verify whether near-synonyms match given contexts. Such algorithms could be used in applications to assist L2 learners in discovering the collocational differences between near-synonyms. We propose a discriminative vector space model for the near-synonym substitution task, and consider this task as a classification task. There are two components: a vector space model and discriminative training. The vector space model is used as a baseline classifier to classify test examples into one of the near-synonyms in a given near-synonym set. A discriminative training technique is then employed to improve the vector space model by distinguishing positive and negative features for each near-synonym. Experimental results show that the DT-VSM achieves higher accuracy than both pointwise mutual information and n-gram-based methods that have been used in previous studies.