用户名: 密码: 验证码:
基于改进粒子群和K-Means的文本聚类算法研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Research on Text Clustering Algorithm Based on Improved Particle Swarm Optimization and K-means
  • 作者:钮永莉 ; 武斌
  • 英文作者:NIU Yong-li;WU Bin;Department of Information Engineering, Chuzhou Vocational and Technical College;
  • 关键词:K-Means算法 ; MPK-Clusters算法 ; PSO-KMeans算法 ; 惯性权重
  • 英文关键词:K-means algorithm;;MPK-Clusters algorithm;;PSO-K-Means algorithm;;inertia weight
  • 中文刊名:GXJB
  • 英文刊名:Journal of Lanzhou University of Arts and Science(Natural Science Edition)
  • 机构:滁州职业技术学院信息工程系;
  • 出版日期:2019-07-10
  • 出版单位:兰州文理学院学报(自然科学版)
  • 年:2019
  • 期:v.33;No.131
  • 基金:滁州职业技术学院校级科研重点项目(YJZ-2018-19)、(YJZ-2018-04)
  • 语种:中文;
  • 页:GXJB201904009
  • 页数:4
  • CN:04
  • ISSN:62-1212/N
  • 分类号:49-52
摘要
在进行文本聚类时,对于大容量、高维、非结构化的文本数据,单纯的K-Means聚类效果不佳,容易陷入局部最优解.本文改进了粒子群优化算法,提出了非线性动态调整惯性权重机制,并将改进后的粒子群算法与局部搜索能力较强的K-Means算法相结合,形成基于改进粒子群和K-Means的文本聚类算法(MPK-Clusters).3种算法的实验对比结果表明,新算法在准确率、召回率和F值方面都优于其他两种算法,取得了更好的文本聚类效果.
        In text clustering, for large-capacity, high-dimensional, unstructured text data, the simple K-means clustering is ineffective and easy to fall into local optimal solution. In this paper, particle swarm optimization(PSO) algorithm is improved, and a mechanism of non-linear dynamic adjustment of inertia weight is proposed. The improved particle swarm optimization algorithm is combined with the K-means algorithm with strong local search ability to form a text clustering algorithm(MPK-Clusters)based on improved particle swarm optimization and K-means. The experimental results of the three algorithms show that the new algorithm is superior to the other two algorithms in terms of accuracy, recall rate and F value, and achieves better text clustering results.
引文
[1] 陈宝楼.K-Means算法研究及在文本聚类中的应用[D].合肥:安徽大学,2013.
    [2] SILVA FIHO,PIMENTEL.Hybrid methods for fuzzy clustering based on fuzzy C-means and improved particle swarm optimization[J].Expert Systems with Applications,2015,42(17):6315-6328.
    [3] 刘铭,刘秉权,刘远超.面向信息检索的快速聚类算法[J].计算机研究与发展,2013,50(7):1452-1463.
    [4] 杨慧,吴沛泽,倪继良.基于改进粒子群置信规则库参数训练算法[J].计算机工程与设计,2017,38(2):400-404.
    [5] 吴夙慧,成颖,郑彦宁,等.文本聚类中文本表示和相似度计算研究综述[J].情报科学,2012,22(4):22-25.
    [6] 搜狗实验室数据资源[EB/OL].http://www.sogou.com/labs/resource/list_news.php
    [7] 黄承慧,印鉴,侯昉.一种结合词项语义信息和TF-IDF方法的文本相似度量方法[J].计算机学报,2011,34(5):856-864.
    [8] SHI Y,EBERHART R C.Empirical study of particle swarm optimization[C].Proceedings of Congress on Computational Intelligence.Washington D.C.,USA:[s.n.],1999:1945-1950.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700