基于改进粒子群和K-Means的文本聚类算法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于改进粒子群和K-Means的文本聚类算法研究

详细信息查看全文 | 推荐本文 |

英文篇名：Research on Text Clustering Algorithm Based on Improved Particle Swarm Optimization and K-means
作者：钮永莉 ; 武斌
英文作者：NIU Yong-li;WU Bin;Department of Information Engineering, Chuzhou Vocational and Technical College;
关键词：K-Means算法 ; MPK-Clusters算法 ; PSO-KMeans算法 ; 惯性权重
英文关键词：K-means algorithm;;MPK-Clusters algorithm;;PSO-K-Means algorithm;;inertia weight
中文刊名：GXJB
英文刊名：Journal of Lanzhou University of Arts and Science(Natural Science Edition)
机构：滁州职业技术学院信息工程系;
出版日期：2019-07-10
出版单位：兰州文理学院学报(自然科学版)
年：2019
期：v.33;No.131
基金：滁州职业技术学院校级科研重点项目(YJZ-2018-19)、(YJZ-2018-04)
语种：中文;
页：GXJB201904009
页数：4
CN：04
ISSN：62-1212/N
分类号：49-52

摘要

在进行文本聚类时,对于大容量、高维、非结构化的文本数据,单纯的K-Means聚类效果不佳,容易陷入局部最优解.本文改进了粒子群优化算法,提出了非线性动态调整惯性权重机制,并将改进后的粒子群算法与局部搜索能力较强的K-Means算法相结合,形成基于改进粒子群和K-Means的文本聚类算法(MPK-Clusters).3种算法的实验对比结果表明,新算法在准确率、召回率和F值方面都优于其他两种算法,取得了更好的文本聚类效果.
In text clustering, for large-capacity, high-dimensional, unstructured text data, the simple K-means clustering is ineffective and easy to fall into local optimal solution. In this paper, particle swarm optimization(PSO) algorithm is improved, and a mechanism of non-linear dynamic adjustment of inertia weight is proposed. The improved particle swarm optimization algorithm is combined with the K-means algorithm with strong local search ability to form a text clustering algorithm(MPK-Clusters)based on improved particle swarm optimization and K-means. The experimental results of the three algorithms show that the new algorithm is superior to the other two algorithms in terms of accuracy, recall rate and F value, and achieves better text clustering results.

引文

[1] 陈宝楼.K-Means算法研究及在文本聚类中的应用[D].合肥:安徽大学,2013.
    [2] SILVA FIHO,PIMENTEL.Hybrid methods for fuzzy clustering based on fuzzy C-means and improved particle swarm optimization[J].Expert Systems with Applications,2015,42(17):6315-6328.
    [3] 刘铭,刘秉权,刘远超.面向信息检索的快速聚类算法[J].计算机研究与发展,2013,50(7):1452-1463.
    [4] 杨慧,吴沛泽,倪继良.基于改进粒子群置信规则库参数训练算法[J].计算机工程与设计,2017,38(2):400-404.
    [5] 吴夙慧,成颖,郑彦宁,等.文本聚类中文本表示和相似度计算研究综述[J].情报科学,2012,22(4):22-25.
    [6] 搜狗实验室数据资源[EB/OL].http://www.sogou.com/labs/resource/list_news.php
    [7] 黄承慧,印鉴,侯昉.一种结合词项语义信息和TF-IDF方法的文本相似度量方法[J].计算机学报,2011,34(5):856-864.
    [8] SHI Y,EBERHART R C.Empirical study of particle swarm optimization[C].Proceedings of Congress on Computational Intelligence.Washington D.C.,USA:[s.n.],1999:1945-1950.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700