分布式虚拟化计算平台高可靠任务拆分系统的设计与实现

作者：孔舟
论文级别：硕士
学科专业名称：计算机系统结构
中文关键词：分布式计算 ; 任务拆分 ; MapReduce ; 双机热备
英文关键词：distributed computing ; task spliting ; MapReduce ; hot standby
学位年度：2011
导师：卢显良
学科代码：081201
学位授予单位：电子科技大学
论文提交日期：2011-03-01

摘要

随着互联网技术的高速发展和移动通信业务的兴起,用户的数据量呈爆炸式的增长,尤其是在电信行业。新颖的电信增值业务不断推出,电信客户群不断增多等都直接导致了电信业务量的迅速增长。此外,由于用户对电信业务的服务在实时性、安全性和稳定性方面有着极高的要求,所以就如何及时的处理海量数据集这一问题已成为了当前的研究热点。
     国际上提出的“云计算”概念就是用以解决对海量数据集的处理问题的。目前在这方面有着较成熟应用的主要有Google的云计算应用平台、Amazon的弹性计算云、IBM的蓝云计算平台,但是这些技术还没有应用于电信行业。本文主要研究分布式计算在电信项目中的应用。首先结合当前国内外的研究现状和发展态势,分析现有云计算的架构,再根据自身项目的具体需求,设计出分布式虚拟化计算平台(Distributed Virtual Computing Platform,DVCP)中的高可靠任务拆分系统。
     本文的主要工作有以下几点:
     第一,从整体上介绍DVCP项目。该项目对大规模数据集进行实时的采集、统计和分析。使系统在满足新兴的业务服务的同时还能继续提供传统业务服务。
     第二,任务拆分模块的设计与实现。本模块的设计基于Epoll和线程池的服务框架。采用Epoll异步事件驱动机制处理网络IO请求,采用线程池模型处理磁盘IO的异步操作,提高IO效率。任务拆分采用MapReduce并行计算架构,可以解决大数据集的分布式计算问题,采用这种方法对电信数据进行处理能够大大提高对数据统计分析的能力。
     第三,系统容错模块的设计与实现。为了保证系统的高可用性,本系统采用双机热备的容错机制,其实现模式为没有共享存储设备的纯软件方式,即将数据(包括状态值和业务数据中间结果)实时地从主设备中备份到从设备中,并保存在从设备的内存中。当主设备出现故障时,可进行快速切换。
     本系统通过负载压力发生器的测试,结果表明系统的功能能够正常运行,并且在保证实时性的前提下可以满足大规模用户同时访问的需求。
With the rapid development of Internet technology and the rise of mobile communication services, the users’data is explosively increasing, especially in the telecommunication industry. The launching new value-added telecom service, and the increasingly large amounts of telecom customers lead to the direct result that the telecom services volume is growing quickly. Also, the telecom service has very high requirements in real-time, security and stability. Therefore, how to deal with massive data sets in time has become current hot issues.
     The concept“cloud computing”has been raised to solve the hot issues. These corporations are Google, Amazon and IBM. However, they have not applied their cloud computing platforms in the telecom industry. How to apply distributed computing into telecom industry is what this thesis described. We learn the current situation and development trend about both domestic and international researches, and then analyze the existing cloud computing architecture, finally design a high reliability task spliting system in the DVCP (Distributed Virtual Computing Platform) project according to the specific requirements.
     The main works of the thesis are as follows:
     Firstly, describe the whole system DVCP. It collects, statistics and analyzes the large-scale data sets real-time under the premise of non-affecting users’experience degree. It not only meets to the new business services but also provides traditional business services.
     Secondly, design and implement task split module. This module is based on the architecture of epoll and thread pool. Asynchronous event-driven mechanism Epoll deals with network IO requests for improving the efficiency of the network IO. Thread pool deals with asynchronous disk IO operations for improving the efficiency of disk IO. Splitting task with the parallel computing framework MapReduce can solve the problem of distributed computing with large data sets. It can greatly improve the ability of statistic and analysis data with this method to process telecommunication data.
     Thirdly, design and implement system fault-tolerant module. In order to ensure
     system availability, the system uses a hot standby for fault tolerance, and its implementation model is pure software approach without shared storage device. It means real-timely backuping the data (including state values and business data intermediate results) from the master to the slave device and storing it in the memory. When the master fails, the standby will immediately provide services instead of the master.
     Through the testing with the load pressure gengerator, the results show that the basic functions of the system run normally and under the premise of ensuring real time, the system can meet the needs of large users’accessing simultaneously.

引文

[1]胡世明.中国增值电信业务的发展与思考.数据通信, 2006,3:26-28
    [2]邓倩妮,陈全.云计算及其关键技术.高性能计算发展与应用, 2008,26:3-4
    [3] Top 500 supercomputing sites. http://www.top500.org/
    [4]王健.容错系统中实时任务调度和负载均衡算法研究: [博士学位论文].杭州:浙江大学. 2009
    [5]中国云计算网. http://www.cloudcomputing-china.cn/Article/ShowArticle.asp?ArticleID=1
    [6] Michael Armbrust, Armando Fox, Rean Griffith, et al. A Berkeley View of Cloud Computing.高性能计算发展与应用, 2009, 26: 10-15
    [7] Barroso LA, Dean J, Holzle U. Web search for a planet: The Google cluster architecture. IEEE Micro, 2003,23(2):22-28
    [8] Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung. The Google File System. ACM SOSP 2003, 44(7):45-48
    [9] J. Dean, S. Ghemawat. MapReduce: Simplied Data Processing on Large Clusters.OSDI 2004.
    [10] Mike Burrows, Google Inc. The Chubby lock service for loosely-coupled distributed systems. OSDI 2006
    [11] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, et al. Bigtable: A Distributed Storage System for Structured Data. OSDI 2006
    [12]周敏. MapReduce综述.计算机应用研究, 2008, 6:24-28
    [13] Ralf Lammel. Google’s MapReduce programming model-Revisited, Data Programmability Team, Microsoft Corp., Redmond, WA, USA, 18 July 2007,14-18
    [14] Tushar Chandra, Robert Griesemer, Joshua Redstone. Paxos Made Live - An Engineering Perspective, June 26, 2007,31-37
    [15] Amazon. Amazon elastic compute cloud (Amazon EC2). 2009. http://aws.amazon.com/ec2/
    [16] Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, et al. Dynamo: Amazon’s Highly Available Key-value Store. In: Proc. of the 21st ACM Symp. on Operating Systems Principles. New York: ACM Press, 2007. 205-220.
    [17] Barham P, Dragovic B, Fraser K, et al. Xen and the art of virtualization. In: Proc. of the 9th ACM Symp. on Operating Systems Principles. New York: Bolton Landing, 2003. 164-177.
    [18] IBM. IBM virtualization. 2009. http://www.ibm.com/virtualization
    [19] Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel Abadi. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. New York: ACM Press, 2008. 167-182.
    [20] Apache. Apache hadoop. http://hadoop.apache.org/core/
    [21]陈康,郑纬民.云计算:系统实例与研究现状. Journal of Software, 2009, 20(5):1337-1348
    [22] Jeffrey Dean. Experiences with MapReduce, an abstraction for large-scale computation. Proc, 15th International Conference on Parallel Architectures, 2006, 1-3
    [23]分布式计算系统(Map/Reduce): http://tech.ddvip.com/2009-02/1235810729110018.html
    [24]孙广中,肖锋,熊曦. MasReduce模型的调度及容错机制研究.微电子学与计算机, 2007, 9: 178-180
    [25]万至臻.基于MapReduce模型的并行计算平台的设计与实现: [硕士学位论文].浙江:浙江大学,2008
    [26]陈宝平.话单采集双机备份的研究与实现: [硕士学位论文].大连:大连海事大学, 2005
    [27] F.M.Yang, W.Luo, L.P.Pang. An efficient real-time fault-tolerant scheduling algorithm based on pultiprocessor system. Wuhan University Journal of Natural Sciences, 2007,12(1): 113-116
    [28] P.Ezhilchelvan. On the progress in fault-tolerant real-time computing. Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems,2004,23(7): 103-105
    [29] E.N.Elnozahy. Fault Rolerance for Clusters of Workstation. IEEE Region 10 Conference On Computers.August 2004,126-128
    [30]姚颖熹. LINUX下双机数据热备份的设计与实现: [硕士学位论文].成都:电子科技大学.2001
    [31] Robertson.A. Highly-affordable high availability. Linux Magazine, 2003,21(11): 1-3
    [32]王国豪,陈文智,石教英.基于Linux的高可用系统中IP接管的设计与实现.计算机应用, 2005(07): 1-4
    [33] Cheng Zhuo, Yang Xiaohu. High available software architecture based on cluster technology. IEEE Region 10 Conference On Computers.2002 28-31:327-330
    [34]罗威.分布式实时容错调度算法研究: [博士学位论文].武汉:华中科技大学.2008
    [35]申志冰,罗宇.利用Heartbeat实现Linux上的双机热备系统.计算机工程与应用, 2005, 38(19):126-128
    [36]刘晓洁,黄永佳.基于Linux的双机热备系统的实现技术.计算机应用研究, 24(4): 255-257
    [37] Zonghao Hou, Yongxiang Huang, Shouqi Zheng, et al. Design and implementation of heartbeat in multi-machine environment. Advanced Information Networking and Applications. AINA 2003. 17th International Conference, 2003:583-586
    [38] G.N.Khan, G.Wei, G.S Hura.Distributed recovery block based fault-tolerant multicasting. Canadian Conference on Electrical and Compute Engineering,2003,2:1043-1046
    [39] YIN Kangkai, WANG Mingwei, LI Shanping. Study of Multi-node Heartbeat Model Used in HA Cluster. Computer Engineering, 2005,31(15):102-104
    [40] Yutong Lu, Min Wang. A New Heartbeat Mechanism for Large-Scale Cluster. The first International Workshop on Metropolis/Enterprise Grid and Applications (MEGA 2006), Harbin China, January 16-18, 2006:610-619
    [41]吴娟,马永强,刘影.一种基于主备机快速的双机容错系统.计算机应用, 2005,25(8): 1948-1951
    [42]谢长生,胡兵全.面向应用级的双机容错系统的设计与实现.计算机工程, 2004,74(9): 45-47
    [43]纪舟,陈文智,史烈.基于Linux的高可用性系统的设计和实现.计算机工程与设计,2005, 13(01): 36-38
    [44]谢希仁.计算机网络.北京:电子工业出版社, 2007, 18-19
    [45]段翰聪.基于Epoll的单进程事件驱动通信服务器设计与分析.计算机应用, 2004, 24(10): 23-28
    [46] Scott Meyers. Effective C++(侯捷)北京:电子工业出版社, 2009, 150-184
    [47] Stanley B.Lippman.深度探索C++对象模型(侯捷).武汉:华中科技大学出版社, 2008, 191-231