用户名: 密码: 验证码:
基于FCBF特征选择和集成优化学习的基因表达数据分类算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Gene expression data classification method based on FCBF feature selection and ensemble optimized learning
  • 作者:马超
  • 英文作者:Ma Chao;School of Digital Media,Shenzhen Institute of Information Technology;
  • 关键词:特征选择 ; 集成学习 ; 微阵列基因表达数据 ; 乌鸦搜索算法 ; 核极限学习机
  • 英文关键词:feature selection;;ensemble learning;;microarray gene expression data;;crow search algorithm;;kernel extreme learning machine
  • 中文刊名:计算机应用研究
  • 英文刊名:Application Research of Computers
  • 机构:深圳信息职业技术学院数字媒体学院;
  • 出版日期:2018-07-09 15:11
  • 出版单位:计算机应用研究
  • 年:2019
  • 期:10
  • 基金:国家自然科学基金青年基金资助项目(61303113);; 广东省自然科学基金资助项目(2016A030310072);; 深圳市科技计划项目(KJYY20170724152553858);; 广东省教育厅重点平台及科研项目特色创新类项目(2017GWTSCX040);; 深圳市2017年度规划课题(ybzz17011,ybzz17009,zdzz17005);; 校级科研课题(QN201716)
  • 语种:中文;
  • 页:112-117
  • 页数:6
  • CN:51-1196/TP
  • ISSN:1001-3695
  • 分类号:TP181;Q811.4
摘要
针对微阵列基因表达数据高维小样本、高冗余且高噪声的问题,提出一种基于FCBF特征选择和集成优化学习的分类算法FICS-EKELM。首先使用快速关联过滤方法 FCBF滤除部分不相关特征和噪声,找出与类别相关性较高的特征集合;其次,运用抽样技术生成多个样本子集,在每个训练子集上利用改进乌鸦搜索算法同步实现最优特征子集选择和核极限学习机KELM分类器参数优化;然后基于基分类器构建集成分类模型对目标数据进行分类识别;此外运用多核平台多线程并行方式进一步提高算法计算效率。在六组基因数据集上的实验结果表明,该算法不仅能用较少特征基因达到较优的分类效果,并且分类结果显著高于已有和相似方法,是一种有效的高维数据分类方法。
        In order to solve the problems of microarray gene expression data with the characteristic of high dimension and small sample,high redundancy and a lot of noise,this paper proposed a novel model FICS-EKELM,which was built based on the combination FCBF feature selection and ensemble optimized method,for gene expression data classification. In the proposed method,it firstly used fast correlation-based filter method( FCBF) to eliminate the irrelevant features and noise,and chose the discriminate feature subsets. Secondly,bootstrap technology produced many sample training subsets,by means of these subsets,it used the improved crow search algorithm( ICS) to select optimal feature subsets and optimized parameters for kernel extreme learning machine( KELM) synchronously. And then,it constructed ensemble classifiers for target gene data classification,which based on the basic classifiers. Moreover,the model used parallel method on multi-core platform multithreading processor,which used OpenMP,to speed up the search and optimization process. Experiment on six public famous gene datasets shows that the proposed method not only achieves a higher classification performance with less characteristic genes,but also greatly improves the classification accuracy. It proves the validity of the proposed method.
引文
[1] Boulesteix A L,Strobl C,Augustin T,et al. Evaluating microarray based classifiers:an overview[J]. Cancer Informatics,2008,6:77-97.
    [2] Lin H Y. Reduced gene subset selection based on discrimination power boosting for molecular classification[J]. Knowledge-Based Systems,2018,142(2):181-191.
    [3] Salem H,Attiya G,El-Fishawy N. Classification of human cancer diseases by gene expression profiles[J]. Applied Soft Computing,2017,50(1):124-134.
    [4] Liang Sen,Ma Anjun,Yang Sen,et al. A review of matched pairs feature selection methods for gene expression data analysis[J]. Computational and Structural Biotechnology Journal,2018,16(12):88-97.
    [5] Ghaddar B,Naoum-Sawaya J. High dimensional data classification and feature selection using support vector machines[J]. European Journal of Operational Research,2018,265(3):993-1004.
    [6] Bolón-Canedo V,Sanchez-Marono N,Alonso-Betanzos A. A review of feature selection methods on synthetic data[J]. Knowledge and Information Systems,2013,34(3):483-519.
    [7] Krawczuk J,Lukaszuk T. The feature selection bias problem in relation to high-dimensional gene data[J]. Artificial Intelligence in Medicine,2016,66(1):63-71.
    [8]谢娟英,胡秋锋,董亚非. K-S检验与mRMR相结合的基因选择算法[J].计算机应用研究,2016,33(4):1013-1018,1043.(Xie Juanying,Hu Qiufeng,Dong Yafei. Gene selection algorithm based on K-S test and mRMR[J]. Application Research of Computers,2016,33(4):1013-1018,1043.)
    [9] Lai C M,Yeh W C,Chang C Y. Gene selection using information gain and improved simplified swarm optimization[J]. Neurocomputing,2016,218(12):331-338.
    [10] Lu Huijuan,Chen Junying,Yan Ke,et al. A hybrid feature selection algorithm for gene expression data classification[J]. Neurocomputing,2017,256(9):56-62.
    [11]Wang Hong,Jing Xingjian,Niu Ben. A discrete bacterial algorithm for feature selection in classification of microarray gene expression cancer data[J]. Knowledge-Based Systems,2017,126(6):8-19.
    [12]Chen Yumin,Zhang Zunjun,Zheng Jianzhong,et al. Gene selection for tumor classification using neighborhood rough sets and entropy measures[J]. Journal of Biomedical Informatics,2017,67:59-68.
    [13]Wang Aiguo,An Ning,Yang Jing,et al. Wrapper-based gene selection with Markov blanket[J]. Computers in Biology and Medicine,2017,81(2):11-23.
    [14]吴辰文,李晨阳,郭叔瑾,等.基于Relief F和蚁群算法的特征基因选择方法[J].计算机应用研究,2018,35(9):2610-2613.(Wu Chenwen,Li Chenyang,Guo Shujin,et al. Feature gene selection method based on Relief F and ant colony optimization[J]. Application Research of Computers,2018,35(9):2610-2613.)
    [15]Jain I,Jain V K,Jain R. Correlation feature selection based improved binary particle swarm optimization for gene selection and cancer classification[J]. Applied Soft Computing,2018,62(1):203-215.
    [16]Luo Fangfang,Guo Wenzhong,Yu Yuanlong,et al. A multi-label classification algorithm based on kernel extreme learning machine[J].Neurocomputing,2017,260(10):313-320.
    [17] Song Qinbao,Ni Jingjie,Wang Guangtao. A fast clustering-based feature subset selection algorithm for high-dimensional data[J]. IEEE Trans on Knowledge and Data Engineering,2013,25(1):1-14.
    [18]Huang Guangbin,Zhou Hongming,Ding Xiaojian,et al. Extreme learning machine for regression and multiclass classification[J]. IEEE Trans on Systems,Man,and Cybernetics,Part B:Cybernetics,2012,42(2):513-529.
    [19]Askarzadeh A. A novel metaheuristic method for solving constrained engineering optimization problems:crow search algorithm[J]. Computers&Structures,2016,169(6):1-12.
    [20]Wang Gaige,Guo Lihong,Gandomi A H,et al. Chaotic krill herd algorithm[J]. Information Sciences,2014,274(8):17-34.
    [21]Rodriguez J J,Kuncheva L I,Alonso C J. Rotation forest:a new classifier ensemble method[J]. IEEE Trans on Pattern Analysis and Machine Intelligence,2006,28(10):1619-1630.
    [22]Hansen L K,Salamon P. Neural network ensembles[J]. IEEE Trans on Pattern Analysis and Machine Intelligence,1990,12(10):993-1001.
    [23]Chapman B,Jost G,Van Der Pas R. Using OpenMP:portable shared memory parallel programming[M]. Cambridge:MIT Press,2007.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700