基于核与软计算方法的模式分析

英文题名：Kernel and Soft Computing Method Based Pattern Analysis
作者：李学华
论文级别：博士
学科专业名称：计算机软件与理论
中文关键词：核方法 ; 软计算 ; 流形学习 ; 支持向量机 ; 核参数 ; 模糊推理
英文关键词：Kernel method ; soft computing ; manifold learning ; support vector machine ; kernel parameter ; fuzzy inference
学位年度：2009
导师：舒兰
学科代码：081202
学位授予单位：电子科技大学
论文提交日期：2009-04-01

摘要

随着数字技术与计算机的高速发展，人们需要处理的信息日渐呈现出高维和海量的特点。然而，随之而来的困扰是如何有效地分析和利用这些数据。这是模式识别、数据挖掘、软计算、机器学习等学科所共同面临的问题。在传统的模式分析中，许多分类方法的计算复杂度随着训练集样本个数的增加而快速增长。因此对于较大规模数据集的处理常常会陷入困境。基于核的模式分析方法是模式分析领域的一种新理论，首先以支持向量机的形式出现，是一种可以用来摆脱传统的计算和统计上困难的分类算法。此外，核方法提供了一个统一的框架，来思考和操作各种类型的数据，不管它们是向量、串或更复杂的对象，同时也能够进行多种类型的模式分析。由于工程应用中信息处理的需求，软计算方法得到大量的应用，其理论也在不断的发展，与其它方法的结合应用也在不断出现，使其在信息数据分析处理应用中显现出重要性。本文对核方法与软计算方法在模式分析中的应用进行了深入的研究，探讨了核方法与软计算方法在实际应用中出现的问题与局限性。并融合这些算法构建出更有效的模式分析新算法。全文共六章，主要研究工作为以下四个部分：
     第一部分研究在核特征空间中的特征分析算法。核函数的使用提供了一种强有力的、符合原理的方法，在核特征空间中用人们熟知的线性算法分析非线性关系。本文结合核方法、主元分析和线性判别分析等机器学习方法，提出了一种特征分析的KPL方法。KPL方法既能够保持数据集的非线性关系又能够提取出数据集最有效分类的方向。
     第二部分研究在核特征空间中的流形学习算法。首先，本文提出了的基于核局部线性嵌入方法，该方法能够自动选择最优的近邻个数，并能构造分布均匀的流形。克服了局部线性嵌入方法对其近邻个数太过敏感，以及要求流形上的分布比较均匀的难题。此外，等距特征映射算法是一种广泛使用的低维嵌入方法，本文将注意力集中在等距特征映射的一个关键问题：等距特征映射利用一个局部邻域信息来构建流形的全局嵌入，等距特征映射距离信息矩阵适当变化可描述成为一个Gram矩阵，因此等距特征映射和Mercer核之间的区别与联系在本文中呈现。
     第三部分研究基于模糊理论的支持向量机，首先研究了模糊支持向量机理论，为了克服支持向量机对噪音的敏感问题，降低噪音对分类结果的不利影响，提出了一种新的基于直接构造分类方法的模糊多类分类支持向量机。该方法结合了模糊思想，并引入了模糊补偿的机制，重新构造并推导了相关的优化问题。实验表明，提出的这种方法在分类精度上有明显的改善。其次，本文将模糊集理论用于支持向量机核函数的构造中，提出了一种基于模糊核支持向量机。该方法用两样本的模糊隶属度点积来代替传统的点积运算，而样本的模糊隶属度值充分刻画了样本对于某类的隶属情况。利用模糊隶属度的点积描述了两样本隶属某类的紧密与相关程度。
     第四部分研究核方法与软计算方法融合形成的优化算法。首先，利用支持向量机优化模糊推理系统，寻找模糊推理系统中关键的模糊规则，约简冗余规则。其次，核参数选择是基于核的学习算法的一种新型技术，对于基于核的模式分析起着至关重要的作用。本文通过几种软计算方法分别求出优化的支持向量机的参数。最后，本文提出了利用遗传算法与粗糙约简算法优化模糊神经网络。
The digital technologies and computer advances have led to high-dimensional and massive data collection. The following puzzle is how to analysis and utilize so much data. That is a common challenge for pattern analysis, data mining, soft computing and machine learning. In traditional pattern analysis, the computational complexity of many classifiers increases quickly as the number of training samples increases. So when applied to a large data set, those classifiers often become computational intractable. Kernel-based analysis is a powerful new theory of patterns analysis, first appears in the form of support vector machines, and overcomes the computational and statistical difficulties alluded to the traditional learning algorithms. Furthermore, the approach provides a unified framework to reason about and operate on data of all types, such as vectorial, strings, or more complex objects, which is enabling the analysis of a wide variety of patterns. Because of the need of information processing in abundant engineering applications, soft computing methods are used widely. The theory of soft computing developed rapidly, and some other relative theories appear constantly, which increases its importance in the application of data analysis and information processing. This doctoral dissertation investigates the use of kernel methods and soft computing methods in pattern analysis, reveals the practical problems in kernel methods and soft computing methods. These techniques are used to construct novel and effective pattern analysis methods. The dissertation consists of four parts with six chapters:
     Part 1 is devoted to investigating features analysis algorithm in the kernel feature space. The use of kernel function provides a powerful and principled way of analyzing nonlinear relations using well-understood linear algorithms in an appropriate feature apace. This paper investigates kernel method, principal component analysis and linear discriminant analysis algorithms for proposed the KPL features analysis algorithm. The proposed KPL features analysis algorithm can keep good characteristic of nonlinear relationship of data and the optimal direction of classification.
     Part 2 contributes to investigate manifold learning algorithm in the kernel feature space. Firstly, a nonlinear dimensionality reduction kernel method based locally linear embedding algorithm is proposed. The proposed algorithm can select the optimal number of nearest neighbors, construct uniform distribution manifold, and overcome the instability of pattern that is caused by the locally linear embedding impressionable the number of nearest neighbors and uniform distribution manifold. ISOMAP is one of widely-used low-dimensional embedding methods. In this paper, we pay our attention to a critical issue that the ISOMAP utilizes local neighborhood information matrices to construct a global embedding of the manifold, is described as Gram matrices, the relation between ISOMAP to Mercer kernel is displayed.
     Part 3 is to study preconditioning fuzzy support vector machine methods. Firstly, a novel directly constructing fuzzy support vector machine method is presented in order to decreasing originally directly constructing methods sensitivity to noise data, and overcoming disadvantages of the noise data for classification results. The proposed methods integrate fuzzy thoughts, introduce fuzzy compensation, and reconstruct and deduce corresponding optimal problems. Experimental results indicate the proposed methods have higher precision than originally directly constructing methods. Secondly, this paper presents a novel support vector machine based on fuzzy kernel by applying fuzzy theories into SVM's kernel function. The proposed method replaces traditional inner product with the inner product of fuzzy membership value similarity measurement of two samples, where membership value of a sample sufficiently depicts a case of a sample pertaining to a class, and membership value similarity measurement describes two samples' tightness degree pertaining to a class.
     Part 4 is devoted to investigating the fusion algorithms of kernel methods and soft computing methods. Kernel methods and soft computing methods are used to construct optimized algorithms. Firstly, support vector machine is used for optimizing the fuzzy inference system, SVM reduces the redundant rules and retain the key rules. Secondly, as a new technology, the choice of kernel parameters plays an important role on the kernel-based pattern analysis. Several soft computing algorithms are used to optimize parameter selection problem. Finally, genetic algorithm and rough set theory are used to optimize the adaptive neuro-fuzzy inference system's structure.

引文

[1] J. Shawe-Talyor, N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge: Cambridge Uninversity Press, 2004
    [2] M. Minsky, S. Papert. Perception. Cambridge: MIT Press, 1969
    [3] J.J. Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proceedings of National Academic Science of USA, 1982,79:2554-2558
    [4] J.J. Hopfield. Neurons with graded response have collective computational properties like those of two-state neurons. Proceedings of National Academic Science of U.S.A. 1984, 81:3088-3092
    [5] B. Scholkopf, C. Burges, A.J. Smola. Advances in Kernel Methods: Support Vector Learning.Cambridge: MIT Press, 1999
    [6] B. Scholkopf. Support Vector Learning. Oldenbourg Verlag, Munich, 1997
    [7] C. Comes, V.N. Vapnik. Support vector networks. Machine Learning, 1995, 20:1-25
    [8] B. Scholkopf, A.J. Smola Learning with Kernels. Cambridge: MIT Press, 2002
    [9] K.R. Miiller, A.J. Smola, G. Ratsch, et al. An introduction to kernel-based learning algorithms.IEEE Trans. on Neural Networks, 2001, 12(2): 181-201
    [10] A.J. Smola. Learning with Kernels. PhD thesis, Technische University Berlin, 1998
    [11] B. Scholkopf, A.J. Smola. Learning with Kernels: Support Vector Machines, Regularization,Optimization, and Beyond. Cambridge: MIT Press, 2002
    [12] V.N. Vapnik, A. Lemer. Pattern recognition using generalized portrait method. Automation and Remote Control, 1963, 24(6):774-780
    [13] V.N. Vapnik, A.Y. Chervoknenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probabilities and Its Application, 1971,16(2):263-280
    [14] V.N. Vapnik. Estimation of dependence based on empirical data. Berlin: Springer-Verlag, 1982
    [15] B.E. Boser, I.M. Guyon, V.N. Vapnik. A training algorithm for optimal margin classifier, in Proc.5th Annu. ACM Workshop Compute. Learning Theory. 1992, 144-152
    [16] C. Cortes, V.N. Vapnik. Support-vector networks. Machine Learning, 1995, 20(3):273-297
    [17] V.N. Vapnik, S. Gokowich, A. Smola. Support vector method for function approximation, regression estimation, and signal processing. Advances in Neural Information Processing Systems 9, MA: MIT Press, 1997, 281-287
    [18] N. Cristianini, J. Shawe-Taylor, P. Sykacek. Bayesian classifiers are large margin hyperplanes in a Hilbert space. Machine Learning: Proceedings of the Fifteenth international conference, 1998, 109-117
    [19] J.A.K. Suykens, J. Vandewlle. Least squares support vector machines classifiers. Neural Network Letters, 1999,19(3):293-300
    [20] J.A.K. Suykens, J. Vandewlle. Recurrent least squares support vector machines. IEEE Transactions on Circuits and System, 2000,47(7): 1109-1114
    [21] J.A.K. Suykens, J. Vandewlle. Sparse least squares support vector machines classifiers. The 8th European Symposium on Artificial Neural Networks. Brugers, 2000, 37-42
    [22] O.L. Mangasarian, D.R. Musicant. Lagrangian support vector machines. Journal of Machine Learning Research, 2001, 161-177
    [23] O.L. Mangasarian. Generalized support vector machines. Advances in Large Margin Classifiers. Cambridge: MIT Press, 2000
    [24] S.S. Keerthi, S.K.Shevade, C.Bhattacharyya, K.R.K.Murthy. Improvements to Platt's SMO algorithm for SVM classifier design. Neural Computation, 2001, 3(3):637-649
    [25] H.N. Qi. Support vector machines and application research overview. Computer Engineering, 2004, 30(10):6-9
    [26] B. Sch(?)lkopf, A.J. Smola, R.C. Williamson, et al. New support vector algorithms. Neural Computation, 2000, 12:1207-1245
    [27] S.T. John, C. Nello. Support vector machines and other kernel based learning methods. New York: Cambridge University Press, 2000
    [28] B. Sch(?)lkopf, A. Smola, K.R. M(?)ller. Kernel principal component analysis. In Proceedings ICANN, LNCS, Springer, 1997, 583-589
    [29] B. Sch(?)lkopf, A. Smola, K.R. M(?)ller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 1998,(10):1299-1319
    [30] R.B. Francis, I.J. Michael. Kernel independent component analysis. Journal of Machine Learning Research, 2002, 3:1-48
    [31] S.S. Keerthi. A fast iterative nearest point algorithm for support vector machine classifier design. IEEE Transactions on Neural Networks, 2000, 11(1): 124-136
    [32] E. Osuna, R.Freund, F.Girosi. Training support vector machines: An application to face detection. IEEE CVPR97, 1997, 17-19
    [33]J.C. Platt. Fast training of SVMs using sequential minimal optimization. Advances in Kernel Methods-Support Vector Learning, Cambridge: MIT Press, 1998, 185-208
    [34]J.C. Platt, N. Cristianini J. Shawe-Taylor. Large margin DAGs for multiclass classification. Advances in Neural Information Processing Systems, 2000, 547-553
    [35]W.S. McCulloch, W. Pitts. Logical calculus of the ideas immanent in nervous activity. Bull Math Biopys, 1943, 5:115-133
    [36]D.O. Hebb, The organization of behavior. Willey, New York, 1949
    [37]F. Rosenblatt. Principles of neurodynamics. Spartan, New York, 1962
    [38]T. Kohonen. Association memory: a system theory approach. New York: Springer, 1977
    [39]S. Grossbert. Adaptive pattern classification and universal recording. Biological Cybernetics, 1976, 23:187-202
    [40]K. Fukushima. Cognition: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics. 1980, 36(4): 193-202
    [41]J.A. Feldman, D.H. Ballard. Connectionist models and their properties. Cognitive Science. 1986, 6:205-254
    [42]D.E. Rumelhart, J.L. McClellan& Parallel distributed processing. MIT Press, 1986
    [43]焦李成．神经网络系统理论．西安：西安电子科技大学出版社．1996
    [44]L.A. Zadeh. Fuzzy sets. Information and Control. 1965, 8:338-353
    [45]L.P. Holmbad,J.J. Ostergaard. Control of a cement kiln by fuzzy logic. Fuzzy Information and Decision Processes, North Holland, 1982:389-399
    [46]R.R. Yager. On Ordered Weighted averaging aggregation operators in multi-criteria decision making. IEEE Trans. Systems, Man, and Cybernetics. 1988,18:183-190
    [47]Bart Kosko著．黄崇福译．模糊工程．西安：西安交通大学出版社．1996
    [48]J.S. Jang. ANFIS: Adaptive-network-based-fuzzy-inference-system. IEEE Trans. System Man, Cybernetics. 1993, 23(1):665-685
    [49]Z. Pawlak. Rough Sets. International journal of computer and information science, 1982, 11(5):341-356
    [50]Z. Pawlak, Vagueness and Uncertainty: A Rough Set Perspective. ICS Research Reports 19, Warsaw Univ. of Technology, 1994
    [51]Z. Pawlak, J. Grzymala-Busse, R. Slowinski, W. Ziarko. Rough sets. Comm. ACM, 1995, 38(11):89-95
    [52] K.P. Sankar, M. Pabitra. Case generation using rough set with fuzzy representation. IEEE transactions on knowledge and data engineering, 2004, 16(3):292-300
    [53] D.J. Kim, S.Y. Bang. A handwritten numeral character classification usingtolerant rough set. IEEE transactions on pattern analysis and machine intelligence, 2000, 22(9):923-937
    [54] S.J. Hirano, S.K Tsumoto. A knowledge-oriented clustering technique based onrough sets, in Proc. The 25th IEEE Int. Computer Soft. Appl. Conf., 2001,632-637
    [55] D. Dubois, H. Prade. Rough fuzzy sets and fuzzy rough sets. International Journal of General Systems. 1990,17:191-209
    [56] M. Sarkar, B. Yegnanarayana. Rough-fuzzy membership functions. Fuzzy Systems Proceedings, IEEE World Congress on Computational Intelligence, 1998, 1:796-801
    [57] G.Y. Wang. Algebra view and information view of rough sets theory. Proceedings of S.P. I.E., 2001,4384:200-207
    [58] S.M. Chen. Measures of similarity between vague sets. Fuzzy Sets and Systems, 1995, 74(2):217-223
    [59] J. Richard, Q. Shen. Fuzzy-rough data reduction with ant colony optimization. Fuzzy sets and systems, 2005, 149:5-20
    [60] D.Y. Ye, et al. An improvement on the Jelonek's algorithm of attribute reduction. Acta Electronic Sinica,2000, 12: 81-82
    [61] I. Duntsch, G. Gediga. Uncertainty measures of rough set prediction. Artificial Intelligence, 1998, 106(l):109-137
    [62] J.H. Holland. Adaptation in Nature and Artificial Systems. The University of Michigan press, 1975
    [63] M. Nikravesh, R.A.Levey, D.D. Ekart. Soft computing tools for reservoir characterization (IRESC). SEG Annu.,1999
    [64] T.M. Cover. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans. on Electronic Computers, 1965, 14:326-334
    [65] N. Cristianini, J. Shawe-Talyor. An introduction to SVMs and other kernel-based learning methods. New York:Cambridge University Press, 2000
    [66] V.N. Vapnik, Statistical Learning Theory, John Wiley, New York, 1998
    [67] R.O. Duda, P.E. Hart, D.G. Strok. Pattern classification. 2nd ed. New York: Wiley, 2001
    [68] D.T. Larose. Discovering knowledge in data : an introduction to data mining. New York: Wiley, 2004
    [69]S. Haykin. Neural networks: A Comprehensive Foundation. New Jersey:Practice-Hall Press, 1999
    [70]D.B. Rubin, J.A.L. Roderick. Statistical Analysis with Missing Data. New York: Wiley, 1987
    [71]S.S. Schiffman, M.L. Reynolds, F.W. Young. Introduction to Multi-dimensional Scaling: Theory, Methods and Applications. New York: Academic Press, 1981
    [72]K.I. Kim, S.H. Park, H.J. Kim. Kernel Principal Component Analysis for Texture Classification. IEEE signal processing letters, 2001, 8(2):39-41
    [73]S. Mika, G. Ratsch, J. Weston, B. Sch(o|¨)lkopf, K.R. M(u|¨)ller. Fisher discriminant analysis with kernels. IEEE Neural Networks for Signal Processing Workshop, 1999, 41-48
    [74]C. Baudat, F. Anouar. Generalized discriminant analysis using a kernel approach, Neural Computation, 2000, 12:2385-2404
    [75]P.P. Jia, W.J. Tompkins. A real-time QRS detection algorithm. IEEE Trans. Biomed. Eng., 1985, 32(3):230-233
    [76]Q.J. Escalona, R.H. Mitchell, D.E. Balderson, et al. Fastand reliable QRS alignment technique for high frequency analysis of signal averaged ECG. Med&Biol Eng. Comput, 1993, 31 (7): 137-142
    [77]R. Jane, H. Rix, P. Caminal, et al. Alignment methods for averaging of high resolution cardiac signals: A Comparative Study of Performance. IEEE Trans. Biomed. Eng., 1991, 38(6): 571-577
    [78]张军平．流形学习及应用．中国科学院自动化研究所，北京，2003．
    [79]J.B. Tenenbaum, V.D. Silva, J.C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 2000, 290:2319-2323
    [80]S.T. Roweis, L.K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 2000, 290:2323-2326
    [81]M. Belkin, P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. Advances in Neural Information Processing Systems. 2002, 14:585-591
    [82]M. Belkin, P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 2003, 15(6): 1373-1396
    [83]V. Silva, J. Tenenbaum. Global versus Local Methods in Nonlinear Dimensionality Reduction. Advances in Neural Information Processing Systems, 2003, 15:705-712
    [84]陈维恒．微分流形初步．北京：高等教育出版社，2001
    [85]T.F. Cox, M.A. Cox. Multidimensional Scaling. Chapman&Hall CRC, 2001
    [86] M. Belkin , P. Niyogi. Towards a theoretical foundation for Laplacian-based manifold methods. Proc. 18th Annual Conf. Learning Theory. 2005
    [87] M. Belkin, P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 2003,15(6): 1373-1396
    [88] 赵连伟, 罗四维, 赵艳敞. 高维数据的低维嵌入及嵌入维数研究. 软件学报, 2005, 16(8): 1423-1430
    [89] J. Ham, D. Lee, S. Mika, et al. A kernel view of the dimensionality reduction of manifolds. Proceedings of the International Conference on Machine Learning, 2004,47-54
    [90] M. Brand. Continuous nonlinear dimensionality reduction by kernel eigenmaps. In Proc. 8th International Joint Conf. Artificial Intelligence, Mexico, 2003,547-554
    [91] I. Borg, P. Groenen. Modern Multidimensional Scaling: theory and application. New York, Heidelberg: Springer- Verlag. 1997
    [92] X.H. Li, L. Shu. Kernel based nonlinear dimensionality reduction and classification for genomic microarray. Sensors, 2008, 8:4186-4200
    [93] J.M. Keller, M.R. Gray, J.A. Givens. A fuzzy k-nearest neighbours algorithm. IEEE Transactions on Systems, Man, and Cybernetics, 1985,15:580-585.
    [94] T.B. Thompson, K.C. Chou, C. Zheng. Neural network prediction of the HTV-1 protease cleavage sites, J. Theoret. Biol., 1995,177:369-379
    [95] Z.Q. Beck, L. Hervio, P.E. Dawson, J.E. Elder, E.L. Madison. Identification of efficiently cleaved substrates for HIV-1 protease using a phage display library and use in inhibitor development, Virology, 2000,274:391-401
    [96] Y.D. Cai, K.C. Chou. Artificial neural network model for predicting HIV protease cleavage sites in protein. Adv. Eng. Software, 1998, 29:119-128
    [97] A. Narayanan, X. Wu, Z. Yang. Mining viral protease data to extract cleavage knowledge, Bioinformatics, 2002, 18:S5-S13
    [98] Y.D. Cai, X.J. Liu, X.B. Xu, K.C. Chou. Support vector machines for predicting HIV protease cleavage sites in protein, J. Comput. Chem., 2002, 23:267-274
    [99] A. Brik, C. Wong. Hiv-1 protease: mechanism and drug discovery, Org. Biomol. Chem., 2003, 1:5-14
    [100] D. Dauber, R. Ziermann, N. Parkin, D. Maly, S. Mahrus, J. Harris, J. Ellman, C. Petropoulos, C. Craik. Altered substrate specificity of drug-resistant human immunodeficiency virus type 1 protease, J. Virol., 2002, 76:1359-1368
    [101]T. Rognvaldsson, L. You. Why neural networks should not be used for HIV-1 protease cleavage site prediction, Bioinformatics, 2003, 1702-1709
    [102]J. Khan, J.S. Wei, M. Ringner, L.H. Saal, M. Ladanyi, F. Westermann, et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med., 2001, 7(6):673-679
    [103]R.P. Nikhil, A. Kripamoy, A. Sharma, S.I. Amari. Discovering biomarkers from gene expression data for predicting cancer subgroups using neural networks and relational fuzzy clustering. BMC Bioinformatics, 2007, 8:5-15
    [104]G. Yeo, T. Poggio. Multiclass classification of SRBCTs. Technical Report AI Memo 2007-018, MIT, 2007
    [105]V.N. Vapnik. The Nature of Statistical Learning Theory. New York: Springer-Verlag, 1995
    [106]C.F. Lin, S.D. Wang. Fuzzy support vector machines. IEEE. Transactions on Neural Networks, 2002, 13(2):464-471
    [107]John Shawe-Taylor,Nello Crstianini著，赵玲玲，翁苏明，曾华军等译．模式分析的核方法．北京：机械工业出版社，2006
    [108]P. Concus, G. Meurant. On the computing INV block preconditioning for the conjugate gradient method. BIT, 1986, 26:493-504
    [109]M.R. Cui. A sufficient condition for the convergence of the inexact Uzawa algorithm for saddle point problems. J. Compt. Appl. Math., 2002, 139:189-196
    [110]J.W. Demmel. Applied numerical linear algebra. SIAM, Philadelphia, 1997
    [111]X.H. Li, L. Shu. Fuzzy theory based support vector machine classifier. The 5th International Conference on Fuzzy Systems and Knowledge Discovery, 2008, 1:600-604
    [112]S. Amari, S. Wu. Improving support vector machine classifiers by modufying kernel functions. Neural networks, 1999, 12:783-789
    [113]N.E. Ayat, M. Cheriet, L. Remaki et al. KMOD: A new support vector machine kernel with moderate decreasing for pattern recognition, application to digit imagerecognition. Proceedings of 6th International Conference of Document Analysis and Recognition Seattle, USA: IEEE, 2001, 1215-1219
    [114]H.K. Kwan. Simple sigmoid-like activation function suitable for digital hardware implementation. Electronics Letters, 1992, 28(15): 1379-1380
    [115]E. Soria, J. Martin, G. Camps, et al. A low complexity fuzzy activation function for artificial neural networks. IEEE Transactions Neural Networks, 2003, 14(6):576-1579
    [116] G. Camps-Vails, J.D. Martin-Guerrero, J.L. Rojo-Alvarez, et al. Fuzzy sigmoid kernel for support vector classifiers. Neurocomputing, 2004, 62: 501-506
    [117] C.T. Lin, C.M. Yeh, S.F. Liang, J.F. Chung, N. Kumar. Support-vector-based fuzzy neural network for pattern classification. IEEE Trans. Fuzzy Systerms, 2006, 14:31-37
    [118] C.L. Blake, C.J. Merz. UCI repository of machine learning databases. Dept. Inform. Comput. Sci., Univ. California, Irvine, CA. [Online]. Available: http://www.ics.uci.edu/mlearn/MLRepository.html.
    [119] J.C. Bezdek. Pattern Recognition with Fuzzy Objective Function Algorithms, New York: Plenum Press, 1981
    [120] 张文修, 梁怡,吴伟志. 信息系统与知识发现.北京:科学出版社, 2003
    [121] Y. Takagi and M. Sugeno. Fuzzy identification of systems and its applications to modeling and control. IEEE transactions on systems, man, and cysbernetics, 1985, 15:116-132
    [122] M. Sugeno, G.T. Kang. Structure identification of fuzzy model. Fuzzy set and Fuzzy system, 1988,28:15-33
    [123] Y. Tsukamoto. An approach to fuzzy reasoning method. Advances in Fuzzy Set Theory and Applications, 1979, 137-149
    [124] T. Joachims. Learning to Classify Text using Support Vector Machines: Methods, Theory, and Algorithms. Ph.D. Dissertation, Department of Computer Science, University of Dortmund, 2000
    [125] J.A. Suykens, T.V. Gesfel, D. Brabanter, et al. Least Squares Support Vector River Edge, NJ: World Scientific, 2002
    [126] O. Chapelle, V.N. Vapnik. Model selection for support vector machines. Advances in Neural Information Processing Systems, Cambridge, MA:MIT Press, 2000: 230-236
    [ 127] V. Cherkassky, Y.Q. Ma. Practical selection of SVM parameters and noise estimation regression. Neural Networks, 2004, 17(1):113-126
    [128] J.H. Lee, G.J. Lin. Automatic model selection for support vector machines. Technical Report, Dept. of Computer Science and Information Engineering, National Taiwan University, Taipei. 2000
    [129] V. Cherkassky, X.H. Shao. F.M. Mulier, et al. Model complexity control for regression using VC generalization bounds. IEEE Trans. Neural Networks, 1999, 10(5): 1075-1089
    [130] I. Santamaria, C. Pantaleon, L. Vielva, at al. Blind equalization of constant modulus signals using support vector tiiachines. IEEE Trans. Signal Processing, 2004, 52(6): 1773-1782
    [131]O. Chapelle, V.N. Vapnik, O, Bousquet, et al. Choosing multiple parameters for support vectormachines. Machine Learning, 2002, 46(1-3): 131-159
    [132]C. Gold, P. Sollich. Model selection for support vector machine classification. Neurocomputing, 2003, 55(1-2):221-249
    [133]Y. Bengio. Gradient-based optimization of hyperparameters. Neural Computation, 2000, 12(8):1889-1900
    [134]S.S. Keerthi. Efficient tuning of SVM hyperparameters using radius/margin bound and iterative algorithms. IEEE Trans. Neural Networks, 2002, 13:1225-1229
    [135]K. Ito, R. Nakano. Optimizing support vector regression hyperparameters based on crossvalidation. Proc. Int. Joint Conf. Neural Networks, Portland: Oregon, 2003, 3:2077-2082
    [136]V．N．Vapnik．统计学习理论的本质．北京：清华大学出版社，2000
    [137]朱永生，张优云．支持向量机分类问题中几个问题的研究．计算机工程与应用，2003，39(13)：36-38
    [138]V. Popovicia, S. Bengiob, J. Philippe Thirana. Kernel matching pursuit for large datasets. Pattern Recognition, 2005, 38:2385-2390
    [139]P.C. Fernando, O. Bousquet. Kernel Methods and Their Potential Use in Signal Processing. IEEE Signal Processing Magazine, 2004, 57:1053-1058
    [140]J.H. Yon, S.M. Yang, Structure optimization of fuzzy-neural network using rough set theory. IEEE InternationaI fuzzy systems conference proceedings, 1999, 1666-1670
    [141]J.S. Jang, C.T. Sun. Neuro-fuzzy modeling and control. Proceeding of the IEEE, 1995, 83:378-405
    [142]J.S. Jang, C.T. Sun, E. Mizutani. Neuro-fuzzy and Computing: A Computational Approach to Learning and Machine Intelligence, Prentice Hall, 1997
    [143]张文修，吴伟志，梁吉业，李德玉．粗糙集理论与方法，北京：科学出版社，2001
    [144]王国胤．Rough集理论与知识获取．西安：西安交通大学出版社，2001
    [145]S.K. Pal, S. Mitra, Neuro-Fuzzy Pattern Recognition: Methods in soft Computing, New York: Wiley, 1999
    [146]D. Whitley, T. Starkweather, C. Bogart, Genetic algorithms and neural networks: optimizing connections and connectivity, Parallel Comput., 1990, 14:347-361
    [147]S.K. Pal, A. Skowron, Rough Fuzzy Hybridization: A New Trends in Decision Making, Springer, Singapore, 1999
    [148] T. Kondo. Revised GMDH algorithm estimating degree of the complete polynomial. Tran of the society of instrument and control engineers, 1986, 22(9):928-934