用户名: 密码: 验证码:
A feature selection method using improved regularized linear discriminant analysis
详细信息    查看全文
  • 作者:Alok Sharma (1) (2) (3)
    Kuldip K. Paliwal (2)
    Seiya Imoto (1)
    Satoru Miyano (1)
  • 关键词:Linear discriminant analysis (LDA) ; Regularized LDA ; Feature/gene selection ; Classification accuracy
  • 刊名:Machine Vision and Applications
  • 出版年:2014
  • 出版时间:April 2014
  • 年:2014
  • 卷:25
  • 期:3
  • 页码:775-786
  • 全文大小:674 KB
  • 参考文献:1. Anton, H.: Calculus. Wiley, New York (1995)
    2. Armstrong, S.A., Staunton, J.E., Silverman, L.B., Pieters, R., den Boer, M.L., Minden, M.D., Sallan, S.E., Lander, E.S., Golub, T.R., Korsemeyer, S.J.: MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat. Genet. 30, 41鈥?7 (2002). [Data Source1: http://sdmc.lit.org.sg/GEDatasets/Datasets.html] [Data Source2: http://www.broad.mit.edu/cgi-bin/cancer/publications/pub_paper.cgi?mode=view&paper_id=63]
    3. Banerjee, M., Mitra, S., Banka, H.: Evolutinary-rough feature selection in gene expression data. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 37, 622鈥?32 (2007) CrossRef
    4. Cong G., Tan K.-L., Tung A.K.H., Xu X.: Mining top-k covering rule groups for gene expression data. In: The ACM SIGMOD International Conference on Management of Data, pp. 670鈥?81 (2005)
    5. Dai, D.Q., Yuen, P.C.: Regularized discriminant analysis and its application to face recognition. Pattern Recognit. 36(3), 845鈥?47 (2003) CrossRef
    6. Dai, D.Q., Yuen, P.C.: Face recognition by regularized discriminant analysis. IEEE Trans. SMC 37(4), 1080鈥?085 (2007)
    7. Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinf. Comput. Biol. 523鈥?29 (2003)
    8. Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
    9. Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discriminant methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97, 77鈥?7 (2002) CrossRef
    10. Friedman, J.H.: Regularized discriminant analysis. J. Am. Stat. Assoc. 84(405), 165鈥?75 (1989) CrossRef
    11. Fukunaga, K.: Introduction to Statistical Pattern Recognition. Academic Press, London (1990)
    12. Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906鈥?14 (2000) CrossRef
    13. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531鈥?37 (1999). [Data Source: http://datam.i2r.a-star.edu.sg/datasets/krbd/]
    14. Guo, Y., Hastie, T., Tibshirani, R.: Regularized discriminant analysis and its application in microarrays. Biostatistics 8(1), 86鈥?00 (2007) CrossRef
    15. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389鈥?22 (2002) CrossRef
    16. Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning. Springer, NY (2001) CrossRef
    17. Huang, R., Liu, Q., Lu, H., Ma, S.: Solving the small sample size problem of LDA. Proc. ICPR 3, 29鈥?2 (2002)
    18. Huang, Y., Xu, D., Nie, F.: Semi-supervised dimension reduction using trace ratio criterion. IEEE Trans. Neural Netw. Learn. Syst. 23(3), 519鈥?26 (2012) CrossRef
    19. Huang, Y., Xu, D., Nie, F.: Patch distribution compatible semi-supervised dimension reduction for face and human gait recognition. IEEE Trans. Circuits Syst. Video Technol. 22(3), 479鈥?88 (2012) CrossRef
    20. Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural network. Nat. Med. 7, 673鈥?79 (2001). [Data Source: http://research.nhgri.nih.gov/microarray/Supplement/]
    21. Li, J., Wong, L.: Using rules to analyse bio-medical data: a comparison between C4.5 and PCL. In: Advances in Web-Age Information Management, pp. 254鈥?65. Springer, Berlin (2003)
    22. Liu, J., Chen, S.C., Tan, X.Y.: Efficient pseudo-inverse linear discriminant analysis and its nonlinear form for face recognition. Int. J. Patt. Recogn. Artif. Intell. 21(8), 1265鈥?278 (2007) CrossRef
    23. Nie, F., Huang, H., Cai X., Ding, C.: Efficient and robust feature selection via joint $l_{2,1} $ -norms minimization, NIPS (2010)
    24. Pan, W.: A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics 18, 546鈥?54 (2002) CrossRef
    25. Pavlidis, P., Weston, J., Cai, J. and Grundy, W.N.: Gene functional classification from heterogeneous data. In: International Conference on Computational Biology, pp. 249鈥?55 (2001)
    26. Peng, H., Long, F., Dong, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226鈥?238 (2005) CrossRef
    27. Saeys, Y., Inza, I., Larra帽aga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507鈥?517 (2007) CrossRef
    28. Sharma, A., Imoto, S., Miyano, S.: A top-r feature selection algorithm for microarray gene expression data. IEEE/ACM Trans. Computat. Biol. Bioinf. 9(3), 754鈥?64 (2012) CrossRef
    29. Sharma, A., Imoto, S., Miyano, S.: A between-class overlapping filter-based method for transcriptome data analysis. J. Bioinf. Computat. Biol. 10(5), 1250010-1鈥?250010-20 (2012)
    30. Sharma, A., Imoto, S., Miyano, S., Sharma, V.: Null space based feature selection method for gene expression data. Int. J. Mach. Learn. Cybern. 3(4), 269鈥?76 (2012). doi:10.1007/s13042-011-0061-9 CrossRef
    31. Sharma, A., Koh, C.H., Imoto, S., Miyano, S.: Strategy of finding optimal number of features on gene expression data. IEE. Electron. Lett. 47(8), 480鈥?82 (2011) CrossRef
    32. Sharma, A., Paliwal, K.K.: Fast principal component analysis using fixed-point algorithm. Pattern Recognit. Lett. 28(10), 1151鈥?155 (2007) CrossRef
    33. Sharma, A., Paliwal, K.K.: Rotational linear discriminant analysis for dimensionality reduction. IEEE Trans. Knowl. Data Eng. 20(10), 1336鈥?347 (2008) CrossRef
    34. Sharma, A., Paliwal, K.K.: A gradient linear discriminant analysis for small sample sized problem. Neural Process. Lett. 27(1), 17鈥?4 (2008) CrossRef
    35. Sharma, A., Paliwal, K.K.: A new perspective to null linear discriminant analysis method and its fast implementation using random matrix multiplication with scatter matrices. Pattern Recognit. 45, 2205鈥?213 (2012) CrossRef
    36. Sharma, A., Lyons, J., Dehzangi, A., Paliwal, K.K.: A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. J. Theoret. Biol. 320(7), 41鈥?6 (2013) CrossRef
    37. Sharma, A., Paliwal, K.K., Imoto, S., Miyano, S., Sharma, V., Ananthanarayanan, R.: A feature selection method using fixed-point algorithm for DNA microarray gene expression data. Int. J. Knowl. Based Intell. Eng. Syst. (2013, accepted)
    38. Su, Y., Murali, T.M., Pavlovic, V., Kasif, S.: RankGene: identification of diagnostic genes based on expression data, Bioinformatics, pp. 1578鈥?579 (2003)
    39. Tan, A.C., Gilbert, D.: Ensemble machine learning on gene expression data for cancer classification. Appl. Bioinf. 2(3 Suppl), S75鈥?3 (2003)
    40. Tao, L., Zhang, C., Ogihara, M.: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(14), 2429鈥?437 (2004)
    41. Thomas, J., Olson, J.M., Tapscott, S.J., Zhao, L.P.: An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res. 11, 1227鈥?236 (2001) CrossRef
    42. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58(1), 267鈥?88 (1996)
    43. Wang, A., Gehan, E.A.: Gene selection for microarray data analysis using principal component analysis. Stat. Med. 24, 2069鈥?087 (2005) CrossRef
    44. Wu, G., Xu, W., Zhang, Y., Wei, Y.: A preconditioned conjugate gradient algorithm fo GeneRank with application to microarray data mining. Data Mining Knowl. Discov. (2011). doi:10.1007/s10618-011-0245-7
    45. Xu, D., Yan, S.: Semi-supervised bilinear subspace learning. IEEE Trans. Image Process. 18(7), 1671鈥?676 (2009) CrossRef
    46. Zhou, L., Wang, L., Shen, C., Barnes, N.: Hippocampal shape classification using redundancy constrained feature selection. Medical Image Computing and Computer-Assisted Intervention, MICCAI 2010. In: Lecture Notes in Computer Science, vol. 6362, pp. 266鈥?73. Springer, Berlin (2010)
  • 作者单位:Alok Sharma (1) (2) (3)
    Kuldip K. Paliwal (2)
    Seiya Imoto (1)
    Satoru Miyano (1)

    1. Laboratory of DNA Information Analysis, University of Tokyo, Tokyo, Japan
    2. School of Engineering, Griffith University, Brisbane, Australia
    3. School of Engineering and Physics, University of the South Pacific, Suva, Fiji
  • ISSN:1432-1769
文摘
Investigation of genes, using data analysis and computer-based methods, has gained widespread attention in solving human cancer classification problem. DNA microarray gene expression datasets are readily utilized for this purpose. In this paper, we propose a feature selection method using improved regularized linear discriminant analysis technique to select important genes, crucial for human cancer classification problem. The experiment is conducted on several DNA microarray gene expression datasets and promising results are obtained when compared with several other existing feature selection methods.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700