用户名: 密码: 验证码:
矩阵的低秩近似算法及其应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着计算机的不断发展和互联网的快速普及,人们收集数据以及存储数据的能力都大大提高。在过去十年里,无论在科学研究还是在社会生活的各个领域都积累了大量的数据。如何对这些数据进行分析以发掘数据蕴含的有用信息以及如何有效管理这些数据已经成为计算机科学和应用数学领域共同关心的中心话题。许多的机器学习(如核学习,度量学习)和数据管理问题(数据差分隐私)都可以以矩阵的形式表达,然而在实际应用中往往涉及到百万甚至千万条记录或样本,基于矩阵的数据分析技术的空间和时间复杂度上会随着问题的规模呈二次方增长,这使得很多大规模的应用马上变得不可行。因此近似一个目标矩阵而令数据分析技术更精确更适合于大规模的实际应用已成为当今机器学习和数据管理领域十分热门的话题。受到支持向量机、压缩感知和非负矩阵分解等稀疏和低秩等技术的启发,人们开发了一系列基于矩阵分析技术的机器学习和数据管理算法。
     本论文主要讨论了矩阵的低秩近似算法以及在机器学习和数据管理中的应用。总的来说,本博士论文主要有三点贡献。
     1)提出了一个快速的算法来解决低秩二次半正定优化问题。低秩矩阵近似算法在大规模机器学习上是一个非常有效的模型,因为它不但减低内存和运行时间的复杂度,而且在保持着高准确率的同时提供了一种很自然的正则参数的方法。在本论文中,我们讨论了一类特殊的非凸二次矩阵半正定优化问题。虽然问题是非凸的,我们研究了这些问题的一些特殊结构,从而设计了一个快速收敛的局部最优的算法。而且,我们提出的算法运行效率高,在一系列机器学习中重要的具体问题上都表现了很好的可拓展性,这些问题包括稀疏特征值,距离度量学习以及核学习问题。大量在UCI数据集上的实验结果表明我们提出的算法有着运算速度快和测试精度高的优点。
     2)提出了一个双边贪心策略的低秩半正定优化算法。很多的机器学习任务(如度量学习和流型学习)都可以归约为凸半正定规划问题。为了满足很多大规模的机器学习任务需求,如何设计一个鲁棒的、可扩展的、适合大规模的半正定规划问题是很多学者一直在探讨的问题。在本文中,我们提出了一种新颖的双边贪心优化(BILateral Greedy Optimization, BILGO)算法求解一个大规模数据集上通用的半正定规划问题。和以往的方法不同,BILGO在每一步优化迭代中采用了一种双边的搜索策略,它通过使用上一步的解和一个秩1矩阵的线性组合来决定当前的半正定的解,而这个秩为1的矩阵可以通过计算当前迭代的下降方向的主特征值向量来快速获得。通过优化双边组合的系数,BILGO总是能降低评价函数,一直到KKT最优条件满足为止,因此算法能保证收敛到全局最优解。事实上,对于一个-精度近似的解,BILGO收敛的所需迭代次数为O(∈~(-1))。我们提出的算法因此可以成功地结合当前传统的秩1更新的算法和梯度下降的算法的效率。最后一点,也是本论文的主线,就是BILGO通过简单的修改就可以处理低秩约束的半正定优化问题,低秩算法因此可以改进我们原来的贪心算法,因而使得我们的优化更加鲁棒和快速。我们的在大规模的实验分析表明BILGO在一系列问题上都取得了较好的效果。
     3)提出了一个在差分隐私框架约束下的快速准确的批线性查询处理优化算法。差分隐私是一种很有前景的用于对敏感数据统计查询的隐私保护处理模型。该模型通过在每个查询的结果上注入随机噪声而使得攻击者根据加噪音后的结果从理论上难以推断任何个人记录是否存在在统计查询中。差分隐私化的查询处理的主要目标是最大化的查询结果的准确性,同时满足所承诺的差分隐私度。以往的研究,特别是李等人建议用适当的策略矩阵,作为一个整体处理一批相关查询的方案比单独地处理这些查询精度高得多。然而,就我们所知,对于任意的查询集,目前还没有有效的可以找到一个很好的策略矩阵的优化算法。现有的方法要么所产生的策略矩阵质量差(往往比最直接的方法差),要么就是即使对于中等大小的维度数据都需要非常昂贵的计算开销。基于这样一个出发点,我们提出了矩阵的低秩机制来解决差分隐私下的线性查询批处理优化问题。该模型是差分隐私下第一个实用的基于低秩近似算法的线性查询优化模型。再者,我们证明了低秩机制所取得的精度是差分隐私下的任何机制的理论下界相近。大量真实数据下的实验表明,我们提出的低秩机制性能的效果都大大优于现存最具代表性的方法。本学位论文采用排版系统LATEX编写。
With the growing popularity of personal computers and the rapid development of the In-ternet, it becomes more and more easier for people to collect and store data. A large amount ofdata has been accumulated in all fields of scientific research and social life in the last decade.Therefore, howtoanalyzethesedataaswellashowtoeffectivelymanagethesedatahasbecomea common concern in both computer science and applied mathematics. Many machine learningand data management tasks can be expressed as matrix forms, examples of which include kernellearning, metric learning and data differential privacy. However, real-world applications ofteninvolves thousands of millions of records or samples, the space complexity and time complexityofcurrentmatrix-baseddataanalysistechniquesscalesroughlyquadraticallywiththesizeoftheproblem, whichmakesmanyapplicationsquicklyintractableasthesizeoftheproblembecomeslarge. Therefore, how to approximate the target matrix and make the data analysis techniquesmore accurate and scalable is a hot topic in the machine learning and data management commu-nity. Inspired by recent work on sparse and low-rank matrix approximation techniques such asSupport Vector Machines (SVMs), Compressed Sensing (CS) and Non-Negative Matrix Factor-ization (NMF), a variety of matrix-based data analysis approaches have been proposed by theresearchers.
     In this thesis, we discuss the problem of low-rank matrix approximation algorithm and itsapplications in machine learning and data management. Generally speaking, the contributionsof this thesis are three-fold.
     1) We proposed an efficient algorithm for low-rank quadratic semidefinite programming prob-lem. Low rank matrix approximation is an attractive model in large scale machine learningproblems, because it can not only reduce the memory and runtime complexity, but also pro-videanaturalwaytoregularizeparameterswhilepreservinglearningaccuracy. Inthispaper,we address a special class of nonconvex quadratic matrix optimization problems, which re-quire a low rank positive semidefinite solution. Despite their non-convexity, we exploit thestructure of these problems to derive an efficient solver that converges to their local optima.Furthermore, we show that the proposed solution is capable of dramatically enhancing theefficiency and scalability of a variety of concrete problems, which are of significant interest to the machine learning community. These problems include the Top-k Eigenvalue Problem,Distance Learning and Kernel Learning. Extensive experiments on UCI benchmarks haveshown the effectiveness and efficiency of our proposed method.
     2) We proposed a bilateral greedy optimization algorithm for large scale low-rank semidefiniteprogramming. Many machine learning tasks (e.g. metric and manifold learning problems)can be formulated as convex semidefinite programs. To enable the application of these taskson a large-scale, scalability and computational efficiency are considered desirable proper-ties for a practical semidefinite programming algorithm. In this paper, we propose a newbilateral greedy optimization (denoted BILGO) approach to solve general semidefinite pro-gramsonlarge-scaledatasets. Ascomparedtoexistingmethods, BILGOemploysabilateralsearch strategy during each optimization iteration. In such an iteration, the current semidef-inite matrix solution is updated as a bilateral linear combination of the previous solution anda suitable rank-1matrix, which can be efficiently computed from the leading eigenvectorof the descent direction at this iteration. By optimizing for the coefficients of the bilateralcombination, BILGO reduces the cost function in every iteration, thus guaranteeing con-vergence to a global optimum. For an-accurate solution, the number of BILGO iterationsneeded for convergence is O(1). The proposed algorithm thus successfully combines theefficiency of conventional rank-1update algorithms and the effectiveness of gradient de-scent. Last but not the least, BILGO can be easily extended to handle low rank constraints.The low-rank optimization techniques can always improve the greedy baseline implementa-tion, the result is a semidefinite programmming optimization algorithm robust and fast. Ourexperimental analysis on large real datasets clearly demonstrates the superiority of BILGOover state-of-the-art methods.
     3) We proposed an efficient and accurate algorithm for optimizing batch linear counting queryunder differential privacy. Differential privacy is a promising privacy-preserving paradig-m for statistical query processing over sensitive data. It works by injecting random noiseinto each query result, such that it is provably hard for the adversary to infer the presenceor absence of any individual record from the published noisy results. The main objectivein differentially private query processing is to maximize the accuracy of the query results, while satisfying the privacy guarantees. Previous work, notably the matrix mechanism, hassuggested that with an appropriate strategy, processing a batch of correlated queries as awhole achieves considerably higher accuracy than answering them individually. However,to our knowledge there is currently no practical solution to find such a strategy for an arbi-trary query batch; existing methods either return strategies of poor quality (often worse thannaive methods) or require prohibitively expensive computations for even moderately largedomains. Motivated by this, we propose the Low-Rank Mechanism (LRM), the first prac-tical differentially private technique for answering batch queries with high accuracy, basedon a low rank approximation of the workload matrix. We prove that the accuracy providedby LRM is close to the theoretical lower bound for any mechanism to answer a batch ofqueries under differential privacy. Extensive experiments using real data demonstrate thatLRM consistently outperforms state-of-the-art query processing solutions under differentialprivacy, by large margins.
引文
[1] Lee D D, Seung H, et al. Learning the parts of objects by non-negative matrix factorization[J]. Nature,1999,401(6755):788–791.
    [2] Lin C J. Projected gradient methods for nonnegative matrix factorization[J]. Neural computation,2007,19(10):2756–2779.
    [3] Kim H, Park H. Nonnegative matrix factorization based on alternating nonnegativity constrained leastsquares and active set method[J]. SIAM Journal on Matrix Analysis and Applications (SIMAX),2008,30(2):713–730.
    [4] Kim J, Park H. Fast nonnegative matrix factorization: An active-set-like method and comparisons[J].SIAM Journal on Scientific Computing (SISC),2011,33(6):3261–3281.
    [5] Candès E J, Recht B. Exact matrix completion via convex optimization[J]. Foundations of Computa-tional mathematics (FoCM),2009,9(6):717–772.
    [6] Cai J F, Candès E J, Shen Z. A singular value thresholding algorithm for matrix completion[J]. SIAMJournal on Optimization (SIOPT),2010,20(4):1956–1982.
    [7] Toh K C, Yun S. An accelerated proximal gradient algorithm for nuclear norm regularized linear leastsquares problems[J]. Pacific Journal of Optimization,2010,6(615–640):15.
    [8] MaS,GoldfarbD,ChenL. FixedpointandBregmaniterativemethodsformatrixrankminimization[J].Mathematical Programming,2011,128(1):321–353.
    [9] Mitra K, Sheorey S, Chellappa R. Large-Scale Matrix Factorization with Missing Data under Addi-tional Constraints[A]. In: NIPS[C],2010.1651–1659.
    [10] Shalev-Shwartz S, Gonen A, Shamir O. Large-Scale Convex Minimization with a Low-Rank Con-straint[A]. In: International Conference on Machine Learning (ICML)[C],2011.329–336.
    [11] Yuan X, Yang J. Sparse and low-rank matrix decomposition via alternating direction methods[J].Pacific Journal of Optimization (To appear),2012.
    [12] Zhou T, Tao D. Godec: Randomized low-rank&sparse matrix decomposition in noisy case[A]. In:International Conference on Machine Learning (ICML)[C],2011.3:2.
    [13] Lee H, Battle A, Raina R, et al. Efficient sparse coding algorithms[A]. In: Neural InformationProcessing Systems (NIPS)[C],2007.801.
    [14] MairalJ,BachF,PonceJ,etal. SupervisedDictionaryLearning[A]. In: NeuralInformationProcessingSystems (NIPS)[C],2008.1033–1040.
    [15] Bradley D M, Bagnell J A. Differential Sparse Coding[A]. In: Neural Information Processing Systems(NIPS)[C],2008.
    [16] Mairal J, Bach F, Ponce J, et al. Online learning for matrix factorization and sparse coding[J]. Journalof Machine Learning Research (JMLR),2010,11:19–60.
    [17] Jenatton R, Obozinski G, Bach F. Structured sparse principal component analysis[A]. In: InternationalConference on Artificial Intelligence and Statistics (AISTATS)[C],2010.
    [18] Zou H, Hastie T, Tibshirani R. Sparse principal component analysis[J]. Journal of computational andgraphical statistics,2006,15(2):265–286.
    [19] Lu Z, Zhang Y. An augmented Lagrangian approach for sparse principal component analysis[J]. Math-ematical Programming,2011:1–45.
    [20] Banerjee O, El Ghaoui L, d’Aspremont A. Model selection through sparse maximum likelihood es-timation for multivariate gaussian or binary data[J]. Journal of Machine Learning Research (JMLR),2008,9:485–516.
    [21] d’Aspremont A, Banerjee O, El Ghaoui L. First-order methods for sparse covariance selection[J].SIAM Journal on Matrix Analysis and Applications (SIMAX),2008,30(1):56–66.
    [22] LuZ. Smoothoptimizationapproachforsparsecovarianceselection[J]. SIAMJournalonOptimization(SIOPT),2009,19(4):1807–1827.
    [23] Lu Z. Adaptive first-order methods for general sparse inverse covariance selection[J]. SIAM Journalon Matrix Analysis and Applications (SIMAX),2010,31(4):2000–2016.
    [24] Lanckriet G R G, Cristianini N, Bartlett P, et al. Learning the Kernel Matrix with Semidefinite Pro-gramming[J]. Journal of Machine Learning Research (JMLR),2004,5:27–72.
    [25] Bach F R, Lanckriet G R G, Jordan M I. Multiple kernel learning, conic duality, and the SMO algo-rithm[A]. In: International Conference on Machine Learning (ICML)[C],2004.69.
    [26] Sonnenburg S, R tsch G, Sch fer C, et al. Large scale multiple kernel learning[J]. Journal of MachineLearning Research (JMLR),2006,7:1531–1565.
    [27] Rakotomamonjy A, Bach F R, Canu S, et al. SimpleMKL[J]. Journal of Machine Learning Research(JMLR),2008,9:2491–2521.
    [28] Zien A, Ong C S. Multiclass multiple kernel learning[A]. In: International Conference on MachineLearning (ICML)[C],2007.1191–1198.
    [29] Hao Z, Yuan G, Yang X, et al. A primal method for multiple kernel learning[J]. Neural Computingand Applications (NCA),2012:1–13.
    [30] Yuan G, Zhang Z, Winslett M, et al. Low-rank mechanism: optimizing batch queries under differentialprivacy[J]. Proceedings of the VLDB Endowment (VLDB),2012,5(11):1352–1363.
    [31] Yuan G, Zhang Z, Ghanem B, et al. Low-rank quadratic semidefinite programming[J]. Neurocomput-ing,2013,106(0):51–60.
    [32] Burer S, Monteiro R D. Local minima and convergence in low-rank semidefinite programming[J].Mathematical Programming,2005,103(3):427–444.
    [33] Vapnik V. The nature of statistical learning theory[M].[S.l.]: springer,1999.
    [34] Candès E J, Romberg J K, Tao T. Stable signal recovery from incomplete and inaccurate measure-ments[J]. Communications on pure and applied mathematics,2006,59(8):1207–1223.
    [35] Nesterov Y. Smooth minimization of non-smooth functions[J]. Mathematical Programming,2005,103(1):127–152.
    [36] KloftM,BrefeldU,SonnenburgS,etal. EfficientandAccuratelp-NormMultipleKernelLearning[A].In: Neural Information Processing Systems (NIPS)[C],2009.997–1005.
    [37] Vishwanathan S V N, Sun Z, Theera-Ampornpunt N, et al. Multiple Kernel Learning and the SMOAlgorithm[A]. In: Neural Information Processing Systems (NIPS)[C],2010.
    [38] Dwork C, McSherry F, Nissim K, et al. Calibrating noise to sensitivity in private data analysis[J].Proceedings of Theory of Cryptography Conference (TCC),2006:265–284.
    [39] Xiao X, Wang G, Gehrke J. Differential privacy via wavelet transforms[J]. IEEE Transactions onKnowledge and Data Engineering (TKDE),2011,23(8):1200–1214.
    [40] Hay M, Rastogi V, Miklau G, et al. Boosting the accuracy of differentially private histograms throughconsistency[J]. Proceedings of the VLDB Endowment (VLDB),2010,3(1-2):1021–1032.
    [41] Li C, Hay M, Rastogi V, et al. Optimizing linear counting queries under differential privacy[A]. In:Proceedings of Symposium on Principles of database systems (PODS)[C],2010.123–134.
    [42] Li C, Miklau G. An adaptive mechanism for accurate query answering under differential privacy[J].Proceedings of the VLDB Endowment (VLDB),2012,5(6):514–525.
    [43] Zhao X Y, Sun D, Toh K C. A Newton-CG augmented Lagrangian method for semidefinite program-ming[J]. SIAM Journal on Optimization (SIOPT),2010,20(4):1737–1765.
    [44] WenZ,GoldfarbD,ScheinbergK. Blockcoordinatedescentmethodsforsemidefiniteprogramming[J].Handbook on Semidefinite, Conic and Polynomial Optimization,2012:533–564.
    [45] SchmidtM,VanDenBergE,FriedlanderM,etal. Optimizingcostlyfunctionswithsimpleconstraints:A limited-memory projected quasi-newton algorithm[A]. In: International Conference on ArtificialIntelligence and Statistics (AISTATS)[C],2009.456–463.
    [46] Zhou T, Tao D, Wu X. NESVM: a fast gradient method for support vector machines[A]. In: Proceed-ings of IEEE International Conference on Data Mining (ICDM)[C],2010.679–688.
    [47] d’AspremontA,ElGhaouiL,JordanMI,etal. AdirectformulationforsparsePCAusingsemidefiniteprogramming[J]. SIAM review,2007,49(3):434–448.
    [48] Birgin E G, Martínez J M, Raydan M. Nonmonotone spectral projected gradient methods on convexsets[J]. SIAM Journal on Optimization (SIOPT),2000,10(4):1196–1211.
    [49] Nesterov Y. Introductory lectures on convex optimization: A basic course[M]. Vol.87.[S.l.]: Springer,2003.
    [50] Scheinberg K. An efficient implementation of an active set method for SVMs[J]. Journal of MachineLearning Research (JMLR),2006,7:2237–2257.
    [51] Fan R, Chang K, Hsieh C, et al. LIBLINEAR: A library for large linear classification[J]. Journal ofMachine Learning Research (JMLR),2008,9:1871–1874.
    [52] LinZ,ChenM,WuL,etal. Theaugmentedlagrangemultipliermethodforexactrecoveryofcorruptedlow-rank matrices[J]. Arxiv preprint arXiv:1009.5055,2010.
    [53] Yin W, Osher S, Goldfarb D, et al. Bregman iterative algorithms for l1-minimization with applicationsto compressed sensing[J]. SIAM Journal on Imaging Sciences (SIIMS),2008,1(1):143–168.
    [54] d’Aspremont A. Subsampling Algorithms for Semidefinite Programming[J]. Stochastic Systems (Toappear),2012, Preprint on ArXiv:0803.1990.
    [55] Korattikara A, Boyles L, Welling M, et al. Statistical optimization of non-negative matrix factoriza-tion[J]. Proceedings of International Con ference on Artificial Intelligence and Statistics (AISTATS),2011.
    [56] Nesterov Y. Efficiency of coordinate descent methods on huge-scale optimization problems[J]. SIAMJournal on Optimization (SIOPT),2012,22(2):341–362.
    [57] Wen Z, Goldfarb D, Yin W. Alternating direction augmented Lagrangian methods for semidefiniteprogramming[J]. Mathematical Programming Computation,2010,2(3):203–230.
    [58] Li C, Hay M, Rastogi V, et al. Optimizing Linear Counting Queries under Differential Privacy. Fullversion url: http://arxiv.org/abs/0912.4742[A]. In: Proceedings of ACM Symposium on Principles ofDatabase Systems (PODS)[C],2010.123–134.
    [59] Li Y D, Zhang Z, Winslett M, et al. Compressive Mechanism: Utilizing Sparse Respresentation inDifferential Privacy[A]. In: Proceedings of ACM Workshop on Privacy in the Electronic Society(WPES)[C],2011.177–182.
    [60] Xiao X, Wang G, Gehrke J. Differential Privacy via Wavelet Transforms[A]. In: Proceedings of IEEEInternational Conference on Data Engineering (ICDE)[C],2010.225–236.
    [61] Xu J, Zhang Z, Xiao X, et al. Differentially Private Histogram Publication[A]. In: Proceedings ofIEEE International Conference on Data Engineering (ICDE)[C],2012.32–43.
    [62] Weinberger K Q, Saul L K. Distance Metric Learning for Large Margin Nearest Neighbor Classifica-tion[J]. Journal of Machine Learning Research (JMLR),2009,10:207–244.
    [63] Kulis B, Sustik M A, Dhillon I S. Low-Rank Kernel Learning with Bregman Matrix Divergences[J].Journal of Machine Learning Research (JMLR),2009,10:341–376.
    [64] Jain P, Kulis B, Dhillon I S. Inductive Regularized Learning of Kernel Functions.[A]. In: NeuralInformation Processing Systems (NIPS)[C],2010.946–954.
    [65] Bian W, Tao D. Learning a Distance Metric by Empirical Loss Minimization[A]. In: InternationalJoint Conference on Artificial Intelligence (IJCAI)[C],2011.1186–1191.
    [66] DavisJV,KulisB,JainP,etal. Information-theoreticmetriclearning[A]. In: InternationalConferenceon Machine Learning (ICML)[C],2007.209–216.
    [67] Guan N, Tao D, Luo Z, et al. Online Nonnegative Matrix Factorization With Robust Stochastic Ap-proximation[J]. IEEE Transactions on Neural Networks and Learning Systems (TNNLS),2012,23(7):1087–1099.
    [68] Rennie J D M, Srebro N. Fast maximum margin matrix factorization for collaborative prediction[A].In: International Conference on Machine Learning (ICML)[C],2005.713–719.
    [69] Zhou T, Tao D. GoDec: Randomized Low-rank and Sparse Matrix Decomposition in Noisy Case[A].In: International Conference on Machine Learning (ICML)[C],2011.33–40.
    [70] Candès E J, Li X, Ma Y, et al. Robust principal component analysis?[J]. Journal of ACM,2011,58(3):11.
    [71] d’ Aspremont A. Subsampling Algorithms for Semidefinite Programming[J]. Stochastic Systems,2011,2(1):274–305.
    [72] Guan N, Tao D, Luo Z, et al. NeNMF: An Optimal Gradient Method for Nonnegative Matrix Factor-ization[J]. IEEE Transactions on Image Processing (TIP),2012,60(6):2882–2898.
    [73] Guan N, Tao D, Luo Z, et al. Manifold Regularized Discriminative Nonnegative Matrix FactorizationWithFastGradientDescent[J]. IEEETransactionsonImageProcessing(TIP),2011,20(7):2030–2048.
    [74] Meyer G, Bonnabel S, Sepulchre R. Regression on Fixed-Rank Positive Semidefinite Matrices: ARiemannian Approach[J]. Journal of Machine Learning Research (JMLR),2011,12:593–625.
    [75] Journée M, Bach F, Absil P A, et al. Low-Rank Optimization on the Cone of Positive SemidefiniteMatrices[J]. SIAM Journal on Optimization (SIOPT),2010,20:2327–2351.
    [76] Bian W, Tao D. Max-Min Distance Analysis by Using Sequential SDP Relaxation for DimensionReduction[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI),2011,33(5):1037–1050.
    [77] Fine S, Scheinberg K. Efficient SVM Training Using Low-Rank Kernel Representations[J]. Journalof Machine Learning Research (JMLR),2001,2:243–264.
    [78] Williams C, Seeger M. Using the Nystr m Method to Speed Up Kernel Machines[A]. In: NeuralInformation Processing Systems (NIPS)[C],2001.682–688.
    [79] Zhang K, Tsang I W, Kwok J T. Improved Nystr m low-rank approximation and error analysis[A]. In:International Conference on Machine Learning (ICML)[C],2008.1232–1239.
    [80] Fowlkes C, Belongie S, Chung F R K, et al. Spectral Grouping Using the Nystr m Method[J]. IEEETransactions on Pattern Analysis and Machine Intelligence (TPAMI),2004,26(2):214–225.
    [81] Chen J, Ye J. Training SVM with indefinite kernels[A]. In: International Conference on MachineLearning (ICML)[C],2008.136–143.
    [82] Cristianini N, Shawe-Taylor J, Elisseeff A, et al. On Kernel-Target Alignment[A]. In: Neural Infor-mation Processing Systems (NIPS)[C],2001.367–373.
    [83] Wang S, Jin R. An Information Geometry Approach for Distance Metric Learning[J]. InternationalConference on Artificial Intelligence and Statistics (AISTATS),2009,5:591–598.
    [84] Dattorro J. Convex Optimization&Euclidean Distance Geometry[M].[S.l.]: Meboo Publishing USA,2011.
    [85] Matrix differential calculus with applications in statistics and econometrics[M].[S.l.]: John Wiley&Sons,1999.
    [86] Boyd S, Vandenberghe L. Convex Optimization[M]. New York, NY, USA: Cambridge UniversityPress,2004.
    [87] Nesterov Y, Polyak B T. Cubic regularization of Newton method and its global performance[J]. Math-ematical Programming,2006,108:177–205.
    [88] Pytlak R. Conjugate Gradient Algorithms in Nonconvex Optimization[J]. Springer Press,2009,89.
    [89] Shewchuk J R. An Introduction to the Conjugate Gradient Method Without the Agonizing Pain[J].Carnegie Mellon University Technical Report,1994.
    [90] Grippo L, Lucidi S. A globally convergent version of the Polak-Ribière conjugate gradient method[J].Mathematical Programming,1997,77:375–391.
    [91] Fokkema D R, Sleijpen G L G, der Vorst H A V. Jacobi-Davidson style QR and QZ algorithms for thereduction of matrix pencils[J]. Journal of Scientific Computing,1998,20:94–125.
    [92] Goldberger J, Roweis S T, Hinton G E, et al. Neighbourhood Components Analysis[A]. In: NeuralInformation Processing Systems (NIPS)[C],2004.
    [93] Globerson A, Roweis S. Metric learning by collapsing classes[J]. Neural Information ProcessingSystems (NIPS),2006,18:451.
    [94] Weinberger K Q, Sha F, Saul L K. Learning a kernel matrix for nonlinear dimensionality reduction[A].In: International Conference on Machine Learning (ICML)[C],2004.
    [95] Song L, Smola A J, Borgwardt K M, et al. Colored Maximum Variance Unfolding[A]. In: NeuralInformation Processing Systems (NIPS)[C],2007.
    [96] Wu X M, So A M C, Li Z, et al. Fast Graph Laplacian Regularized Kernel Learning via Semidefinite-Quadratic-Linear Programming[A]. In: Neural Information Processing Systems (NIPS)[C],2009.1964–1972.
    [97] Meyer G, Bonnabel S, Sepulchre R. Regression on Fixed-Rank Positive Semidefinite Matrices: ARiemannian Approach[J]. Journal of Machine Learning Research (JMLR),2011,12:593–625.
    [98] Mishra B, Meyer G, Sepulchre R. Low-rank optimization for distance matrix completion[A]. In:Proceedings of the50th IEEE Conference on Decision and Control[C],2011.
    [99] Vandereycken B, Vandewalle S. A Riemannian Optimization Approach for Computing Low-RankSolutions of Lyapunov Equations[J]. SIAM Journal on Matrix Analysis and Applications (SIMAX),2010,31(5):2553–2579.
    [100] DavisJV,KulisB,JainP,etal. Information-theoreticmetriclearning[A]. In: InternationalConferenceon Machine Learning (ICML)[C],2007.209–216.
    [101] Liu J, Ji S, Ye J. Multi-Task Feature Learning Via Efficient l2,1-Norm Minimization[A]. In: UAI[C],2009.339–348.
    [102] WenZ,GoldfarbD,ScheinbergK. Blockcoordinatedescentmethodsforsemidefiniteprogramming[J].Handbook on Semidefinite, Conic and Polynomial Optimization,2012:533–564.
    [103] BurerS,MonteiroRDC. LocalMinimaandConvergenceinLow-RankSemidefiniteProgramming[J].Mathematical Programming,2005,103:427–444.
    [104] Hazan E. Sparse approximate solutions to semidefinite programs[A]. In: Proceedings of the8th LatinAmerican conference on Theoretical informatics (LATIN)[C],2008.306–316.
    [105] Shen C, Kim J, Wang L, et al. Positive Semidefinite Metric Learning with Boosting[A]. In: NeuralInformation Processing Systems (NIPS)[C],2009.1651–1659.
    [106] Jaggi M, Sulovsky M. A simple algorithm for nuclear norm regularized problems[A]. In: InternationalConference on Machine Learning (ICML)[C],2010.471–478.
    [107] Bertsekas D P. Nonlinear Programming[M].2nd.[S.l.]: Athena Scientific,1999.
    [108] Nesterov Y. Introductory lectures on convex optimization: a basic course[M].[S.l.]: Kluwer AcademicPublishers,2004.
    [109] Clarkson K L. Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm[J]. ACM Trans-actions on Algorithms (TALG),2010,6(4).
    [110] Dunn J C. Newton’s method and the Goldstein step-length rule for constrained minimization problem-s[J]. SIAM Journal on Control and Optimization (SICON),1980:659–674.
    [111] Paige C C, Saunders M A. Solution of sparse indefinite systems of linear equations[J]. SIAM Journalon Numerical Analysis (SINUM),1975,12:617–629.
    [112] Kleiner A, Rahimi A, Jordan M I. Random Conic Pursuit for Semidefinite Programming[A]. In:NIPS[C],2010.1135–1143.
    [113] WangS,JinR. AnInformationGeometryApproachforDistanceMetricLearning[A]. In: InternationalConference on Artificial Intelligence and Statistics (AISTATS)[C],2009.5:591–598.
    [114] Weinberger K, Saul L. Distance metric learning for large margin nearest neighbor classification[J].Journal of Machine Learning Research (JMLR),2009,10:207–244.
    [115] NesterovYE. Introductorylecturesonconvexoptimization: abasiccourse[M]. AppliedOptimization,vol.87.[S.l.]: Kluwer Academic Publishers,2003.
    [116] Sorensen D C. Implicit application of polynomial filters in a k-step Arnoldi method[J]. SIAM Journalon Matrix Analysis and Applications (SIMAX),1992,13:357–385.
    [117] Dwork C, McSherry F, Nissim K, et al. Calibrating Noise to Sensitivity in Private Data Analysis[A].In: Proceedings of Theory of Cryptography Conference (TCC)[C],2006.265–284.
    [118] Dwork C, Rothblum G N, Vadhan S P. Boosting and Differential Privacy[A]. In: Proceedings ofSymposium on Foundations of Computer Science (FOCS)[C],2010.51–60.
    [119] Hardt M, Talwar K. On the geometry of differential privacy[A]. In: Proceedings of ACM Symposiumon Theory of Computing (STOC)[C],2010.705–714.
    [120] McSherryF,TalwarK. MechanismDesignviaDifferentialPrivacy[A]. In: ProceedingsofSymposiumon Foundations of Computer Science (FOCS)[C],2007.94–103.
    [121] Ding B, Winslett M, Han J, et al. Differentially private data cubes: optimizing noise sources andconsistency[A]. In: Proceedings of ACM SIGMOD International Conference on Management of Data(SIGMOD)[C],2011.217–228.
    [122] HayM,RastogiV,MiklauG,etal. BoostingtheAccuracyofDifferentiallyPrivateHistogramsThroughConsistency[J]. Proceedings of the Very Large Data Bases Endowment (PVLDB),2010,3(1):1021–1032.
    [123] RastogiV,NathS. Differentiallyprivateaggregationof distributedtime-serieswithtransformationandencryption[A]. In: Proceedings of ACM SIGMOD International Conference on Management of Data(SIGMOD)[C],2010.735–746.
    [124] Xiao X, Bender G, Hay M, et al. iReduct: differential privacy with reduced relative errors[A]. In:Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD)[C],2011.229–240.
    [125] Bhaskar R, Laxman S, Smith A, et al. Discovering frequent patterns in sensitive data[A]. In: Pro-ceedings of the ACM SIGKDD International Conference On Knowledge Discovery and Data Mining(SIGKDD)[C],2010.503–512.
    [126] Friedman A, Schuster A. Data mining with differential privacy[A]. In: Proceedings of ACM SIGKDDConference on Knowledge Discovery and Data Mining (KDD)[C],2010.493–502.
    [127] Blum A, Ligett K, Roth A. A learning theory approach to non-interactive database privacy[A]. In:Proceedings of Symposium on Theory of Computing (STOC)[C],2008.609–618.
    [128] Chaudhuri K, Monteleoni C, Sarwate A D. Differentially Privae Empirical Risk Minimization[J].Journal of Machine Learning Research (JMLR),2011,12:1069–1109.
    [129] Rubinstein B I, Bartlett P L, Huang L, et al. Learning in a Large Function Space: Privacy-PreservingMechanisms for SVM Learning[J]. Journal of Privacy and Confidentiality (JPC),2012,4(1):4.
    [130] McSherry F, Mahajan R. Differentially-private network trace analysis[A]. In: Proceedings of ACMSIGCOMM Conference on Data Communication (SIGCOMM)[C],2010.123–134.
    [131] Zhang J, Zhang Z, Xiao X, et al. Functional mechanism: regression analysis under differential priva-cy[J]. Proceedings of the Very Large Data Bases Endowment (VLDB),2012,5(11):1364–1375.
    [132] Dinur I, Nissim K. Revealing information while preserving privacy[A]. In: Proceedings of ACMSymposium on Principles of Database Systems (PODS)[C],2003.202–210.
    [133] Cormode G, Procopiuc C M, Shen E, et al. Differentially Private Spatial Decompositions[A]. In:Proceedings of IEEE International Conference on Data Engineering (ICDE)[C],2012.20–31.
    [134] Mohammed N, Chen R, Fung B C M, et al. Differentially private data release for data mining[A]. In:Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD)[C],2011.493–501.
    [135] McSherry F, Mironov I. Differentially Private Recommender Systems: Building Privacy into the Net-flix Prize Contenders[A]. In: Proceedings of ACM SIGKDD Conference on Knowledge Discoveryand Data Mining (KDD)[C],2009.627–636.
    [136] Wang X. Volumes of generalized unit balls[J]. Mathematics Magazine,2005,78(5):390–395.
    [137] Conn A R, Gould N, Ph, et al. A globally convergent Lagrangian barrier algorithm for optimizationwithgeneralinequalityconstraintsandsimplebounds[J]. MathematicsofComputation,1997,66(217):261–288.
    [138] Duchi J, Shalev-Shwartz S, Singer Y, et al. Efficient projections onto the l1-ball for learning in highdimensions[A]. In: International Conference on Machine Learning (ICML)[C],2008.272–279.
    [139] Fiacco A V, McCormick G P. Nonlinear programming: Sequential unconstrained minimization tech-niques[M].[S.l.]: John Wiley&Sons, New York,1968.
    [140] Bertsekas D P. Nonlinear programming[M].[S.l.]: Athena Scientific,1999.
    [141] Grippo L, Sciandrone M. On the convergence of the block nonlinear Gauss-Seidel method underconvex constraints[J]. Operations Research Letters,2000,26(3):127–136.
    [142] d’Aspremont A, El Ghaoui L, Jordan M I, et al. A Direct Formulation for Sparse PCA Using Semidef-inite Programming[J]. SIAM Review,2007,49:434–448.
    [143] Birgin E, Martínez J, Raydan M. Nonmonotone spectral projected gradient methods on convex sets[J].SIAM Journal on Optimization (SIOPT),2000,10(4):1196–1211.
    [144] Nesterov Y. Subgradient Methods for Huge-Scale Optimizations Problems[J]. CORE Discusion Pa-pers,2012,2.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700