用户名: 密码: 验证码:
Automatic Word Clustering in Russian Texts
详细信息    查看全文
  • 作者:Olga Mitrofanova ; Anton Mukhin ; Polina Panicheva ; Vyacheslav Savitsky
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2007
  • 出版时间:2007
  • 年:2007
  • 卷:4629
  • 期:1
  • 页码:85-91
  • 全文大小:184 KB
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
文摘
The paper deals with development and application of automatic word clustering (AWC) tool aimed at processing Russian texts of various types, which should satisfy the requirements of flexibility and compatibility with other linguistic resources. The construction of AWC tool requires computer implementation of latent semantic analysis (LSA) combined with clustering algorithms. To meet the need, Python-based software has been developed. Major procedures performed by AWC tool are segmentation of input texts and context analysis, co-occurrence matrix construction, agglomerative and K-means clustering. Special attention is drawn to experimental results on clustering words in raw texts with changing parameters.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700