用户名: 密码: 验证码:
Optimizing parallel job performance in data-intensive clusters.
详细信息   
  • 作者:Ananthanarayanan ; Ganesh.
  • 学历:Ph.D.
  • 年:2014
  • 毕业院校:University of California
  • Department:Computer Science
  • ISBN:9781303835414
  • CBH:3616643
  • Country:USA
  • 语种:English
  • FileSize:6214744
  • Pages:125
文摘
Extensive data analysis has become the enabler for diagnostics and decision making in many modern systems. These analyses have both competitive as well as social benefits. To cope with the deluge in data that is growing faster than Moore's law,computation frameworks have resorted to massive parallelization of analytics jobs into many fine-grained tasks. These frameworks promised to provide efficient and fault-tolerant execution of these tasks. However,meeting this promise in clusters spanning hundreds of thousands of machines is challenging and a key departure from earlier work on parallel computing. A simple but key aspect of parallel jobs is the all-or-nothing property: unless all tasks of a job are provided equal improvement,there is no speedup in the completion of the job. The all-or-nothing property is critical for the promise of efficient and fault-tolerant parallel computations on large clusters. Meeting this promise in clusters of these scales is challenging and a key departure from prior work on distributed systems. This work examines the execution of a job from first principles and propose techniques spanning the software stack of data analytics systems such that its tasks achieve homogeneous performance while overcoming the various heterogeneities. To that end,we will propose techniques for (i) caching and cache replacement for parallel jobs,which outperforms even Belady's MIN (that uses an oracle),(ii) data locality,and (iii) straggler mitigation. Our analyses and evaluation are performed using workloads from Facebook and Bing production datacenters. Along the way,we will also describe how we broke the myth of disk-locality's importance in datacenter computing.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700