Design principles and techniques for enabling reboot-based administration in a persistent state storage system.

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

Design principles and techniques for enabling reboot-based administration in a persistent state storage system.

详细信息

作者：Huang ; Andrew C.
学历：Doctor
年：2005
导师：Fox, Armando
毕业院校：Stanford University
专业：Computer Science.
ISBN：0542087065
CBH：3171798
Country：USA
语种：English
FileSize：18854506
Pages：159

文摘

Managing large computer installations is complex and expensive. It is already the case that system administration accounts for a majority of the total cost of ownership, often costing 5 to 10 times more than the purchase price of hardware and software. Since long-running trends of decreasing hardware costs and increasing system complexity exacerbate the problem, ease of management will continue to be a critical system design challenge in the future.;In this thesis, I focus on management of persistent state in Internet-scale systems. These systems are characterized by their large scale and dynamically changing environment. At the scale of hundreds or thousands of nodes, node failures are the common case, which makes failure handling an important area of administration on which to focus. When changes in workload or system environment are frequent, system evolution tasks such as scaling are also common tasks administrators must handle. Finally, since Internet applications serve requests globally for fractions of a penny per access, mechanisms for dealing with scale and change must meet the 24 x 7 availability and low cost requirements.;The approach I take to simplifying state management is to first design the system to have low-cost reboot-based recovery. The properties of "cheap" recovery are that data remains available and data consistency is maintained throughout failure, failover, and recovery. Instead of affecting availability or consistency, failure and recovery manifests as minimal performance degradation that is predictable and bounded. With cheap recovery, system administration can be simplified in two ways. First, cheap recovery simplifies failure detection by lowering the cost of acting on false positives, which in turn, enables the use of statistical techniques to turn hard-to-catch failures, such as node degradation, into failure followed by recovery. Second, cheap recovery can be used to cast system evolution tasks like online data repartitioning into failure plus recovery to achieve zero-downtime incremental scaling. These low-cost failure handling and system evolution mechanisms make it possible for the system to be continuously self-adjusting, a key property of self-managing systems.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700