Transient and Permanent Error Management for Networks-on-Chip.

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

Transient and Permanent Error Management for Networks-on-Chip.

详细信息

作者：Yu ; Qiaoyan.
学历：Doctor
年：2011
导师：Ampadu,Paul,eadvisor
毕业院校：University of Rochester
ISBN：9781124666716
CBH：3458660
Country：USA
语种：English
FileSize：9568826
Pages：270

文摘

Reliability has become one of the most important metrics for on-chip communications infrastructures in nanoscale technologies. Reduced supply voltages and high clock frequency exacerbate the impact of noise sources such as particle strikes and crosstalk,which can cause transient errors in transmitted data. Manufacturing defects and aging issues can cause permanent errors in the communication links. The modularity of the Networks-on-Chip NoCs) approach facilitates the exploration of error control techniques for on-chip interconnects and many-cores systems. Unfortunately,error control is not free. Worst-case error management methods are simple but waste energy and bandwidth in favorable noise conditions. Consequently,cost-effective techniques for improving link error resilience are needed. In this work,we propose configurable error control methods to tackle variable transient errors and exploit existing transient error control redundancy for permanent error management,achieving high reliability and low average energy consumption with minor area overhead. To adapt to the variable transient error rates,a configurable error control coding ECC) scheme is proposed for datalink-layer transient error management. The proposed method can adjust both error detection and error correction capability at runtime by varying the number of redundant wires for parity check bits. The obtained error resilience makes the proposed method suitable for a range of link error rates. Configuring the number of redundant wires to match the noise conditions reduces the average energy consumption in the ECC codec and interconnect link. A hardware-efficient implementation for the configurable ECC is presented,as well. We integrate the error control techniques in the datalink and physical layers to co-manage transient and permanent errors. Infrequently used redundant wires for the configurable ECC are utilized as spare wires to replace permanently unusable links. To maintain the transient and permanent error co-management capability as noise conditions change,we propose a packet re-organization algorithm combined with shortening error control coding method. This method reduces the need for energy-consuming fault-tolerant routing,minimizing latency and energy overhead induced by error control. This co-management method is suitable for NoCs operating in variable noise conditions with a small number of permanently unusable wires. To further improve energy efficiency,the adaptation on ECC is extended to the network layer. We employ end-to-end error control in the network layer in low noise conditions and enhance the error control capability in high noise conditions by adding hop-to-hop error control in the datalink layer. A protocol that boosts or reduces error control strength is presented to support runtime seamless ECC mode switching. Simply combining end-to-end error control with hop-to-hop error control significantly increases energy consumption. To address this issue,we apply the concept of product codes to the dual-layer error control； the hop-to-hop error control is designed to be compatible with one dimension of the product code. Consequently,the dual-layer cooperative error control can switch error control modes without interrupting normal NoC operation,achieving high reliability and energy efficiency in a wide range of link error rates. To evaluate performance and energy consumption of different error control methods on a large size NoC,we propose a flexible parallel NoC simulator. Plug-and-play error control coding ECC) insertion and some typical error control codecs have been implemented in the simulator. The flexible fault injection environment provided by our simulator assists error control exploration for specific purposes. In addition,we use C and message passing interface MPI) languages to schedule parallel simulation on a multiprocessor server,addressing the prohibitive simulation time and system resource challenges caused by the large number of communicating nodes and extensive number of simulation variables.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700