Graph-Based Approach for Cross Domain Text Linking

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

Graph-Based Approach for Cross Domain Text Linking

详细信息查看全文

关键词：Text graph ; Cross domain text ; Text linking ; Semantic similarity
刊名：Lecture Notes in Computer Science
出版年：2015
出版时间：2015
年：2015
卷：9461
期：1
页码：151-160
全文大小：810 KB
参考文献：1.Islam, A., Inkpen, D.: Semantic text similarity using corpus-based word similarity and string similarity. ACM Trans. Knowl. Disc. Data (TKDD) 2(2), 10:1–10:25 (2008)
2.Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013)
3.Zhan, Z., Yang, X., Computer, D.O., et al.: Text similarity calculation based on language network and semantic information. Comput. Eng. Appl. (2014)
4.Shameem, M.U.S., Ferdous, R.: An efficient k-means algorithm integrated with Jaccard distance measure for document clustering. In: First Asian Himalayas International Conference on Internet (AH-ICI 2009). IEEE, pp. 1–6 (2009)
5.Lan, Q.: Extraction of news content for text mining based on edit distance. J. Comput. Inf. Syst. 6(11), 3761–3777 (2010)
6.Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)CrossRef
7.Jimenez, S., Gonzalez, F., Gelbukh, A.: Text comparison using soft cardinality. In: Chavez, E., Lonardi, S. (eds.) String Processing and Information Retrieval. Lecture Notes in Computer Science, vol. 6393, pp. 297–302. Springer, Heidelberg (2010)CrossRef
8.Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: National Conference on Artificial Intelligence, vol. 1, pp. 775–780 (2006)
9.Fern, S., Stevenson, M.A.: Semantic similarity approach to paraphrase detection. In: Computational Linguistics UK Annual Research Colloquium (2008)
10.Turney, P.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (2001)
11.Dumais, S.T.: Latent semantic analysis. Ann. Rev. Inf. Sci. Technol. 38(1), 188–230 (2004)CrossRef
12.Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the Fifteenth Conference On Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc. (1999)
13.Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
14.Zhang, K., Zhu, K.Q., Hwang, S.-w.: An Association Network for Computing Semantic Relatedness (2015)
作者单位：Yu Hu (19)
Tiezheng Nie (19)
Derong Shen (19)
Yue Kou (19)

19. College of Information Science and Engineering, Northeastern University, Shenyang, China
丛书名：Web Technologies and Applications
ISBN：978-3-319-28121-6
刊物类别：Computer Science
刊物主题：Artificial Intelligence and Robotics
Computer Communication Networks
Software Engineering
Data Encryption
Database Management
Computation by Abstract Devices
Algorithm Analysis and Problem Complexity
出版者：Springer Berlin / Heidelberg
ISSN：1611-3349

文摘

Comprehensive analysis of multi-domain texts has generated an important effect on text mining. Although the objects described by these multi-domain texts belong to different fields, they sometimes are overlapped partially; and linking these texts fragments which are overlapped or complementary is a necessary step for many tasks, such as entity resolution, information retrieval and text clustering. Previous works for computing text similarity mainly focus on string-based, corpus-based and knowledge-based approaches. However cross-domain texts exhibit very special features compared to texts in the same domain: (1) entity ambiguity, texts from different domains may contain various references to the same entity; (2) content skewness, cross domain texts are overlapped partially. In this paper, we propose a novel fine-grained approach based on text graph for evaluating the semantic similarity of cross-domain texts to link the similar parts. The experiment results show that our approach gives an effective solution to discover the semantic relationship between cross domain text fragments.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700