用户名: 密码: 验证码:
低功耗软件优化技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
现代计算机系统中功耗成为越来越重要的问题,主要体现在以下两个方面:1)对于嵌入式设备来说,往往依靠电池供电,受到有限的电池供电时间的制约,功耗成为除了性能和面积之外的另一个重要系统参数。与半导体技术的发展速度相比,电池技术发展缓慢,未来的移动设备必须在有限的能量供应下发挥更大的效能。2)对于高性能计算机系统来说,为了提高处理器的性能,晶体管的集成度越来越高,导致功耗急剧地增长。而功耗的急剧增长进一步提高了芯片的封装和制冷成本,且高温环境下的执行增加了芯片的失效率,导致计算机系统的可靠性下降。对大规模并行计算机系统来说,过高的功耗更是造成了巨大的能量消耗。尽管多核处理器的发展在一定程度上缓解了性能和功耗之间的矛盾,但也必须看到多核处理器存在发热大、散热难等问题。
     从底层的电路技术,到逻辑技术、体系结构技术和高层的软件技术,出现了各种用于降低计算机系统功耗的方法。本文的研究重点在于软件的低功耗技术,结合传统的面向性能的优化方法,通过动态电压调节和部件关闭等技术来降低功耗,并同时兼顾了需满足的性能要求。具体说来,本文的主要工作包括以下几个方面:(1)对几种处理器低功耗技术进行了分析,并同时分析了片上存储系统的功耗特征;(2)研究了处理器和存储器协调的能量优化方法,从能量最优化和能量受限条件下的性能最优化两个方面进行了研究;(3)研究了低功耗的软件预取优化方法,结合静态电压调节和动态电压调节方法展开了研究;(4)研究了面向多核结构的并行程序能量优化方法,将处理器核关闭技术和动态电压调节技术有效地应用在面向多核结构的并行程序能量优化中。本文的主要创新如下:
     1、提出了能量最优的面向微处理器和存储器的能量优化模型。弥补了过去只考虑性能优化、不考虑能量优化的不足。该模型建模了软件预取优化中出现的各种情况,通过同时调节处理器和存储器的电压和频率,在满足一定性能约束条件下最小化系统的能量消耗。
     2、提出了能量受限的面向微处理器和存储器的能量优化模型。解决了有限能量供给条件下的性能优化问题。该模型建模了软件预取优化中出现的各种情况,通过同时调节处理器和存储器的电压和频率,在满足一定能量约束条件下最优化系统的性能目标。
     3、提出了基于静态电压调节的低功耗软件预取优化方法。该方法一方面通过调节处理器的电压来降低功耗,另一方面通过调整预取距离来获得性能和功耗的进一步改善,最终使程序运行的平均功耗不超过预取优化前的功耗,消除了因软件预取造成的功耗明显增加,且获得了有效的性能提高。
     4、提出了基于动态电压调节的功耗指导的软件预取优化算法PDP-DVS。该算法通过实时监控系统的功耗水平,自适应地调节处理器的电压值,保证程序消耗的平均功耗不超过未预取时的值,同时获得有效的性能改进。模拟实验结果表明,算法PDP-DVS可以保证在功耗不增加的前提下,获得有效的性能提高。
     5、提出了面向多核结构的并行程序能量优化方法。该方法一方面通过处理器核关闭技术调整程序在执行串行部分时的处理器核数目;另一方面通过动态电压调节技术调节程序在执行并行部分时的每个处理器核的电压和频率值。模拟结果显示该方法可以有效降低并行程序的能量消耗。
Power Consumption has become a more and more important problem in current computer system, especially for processors. The major reasons are: 1) For embedded devices, they are powered by the battery. Constrained by the limited battery life time, power dissipation becomes another important system parameter besides performance and area. Compared with the speed of semiconductor technology development, the speed of the battery technology development is much lower. So it is significant for the futural battery-powered mobile devices to have better energy efficiency supplied with the limited energy consumption. 2) For high performance computer systems, in order to achieve higher performance, more and more transistors are integrated, which sharply increases the power dissipation. While rapid power increase further adds the chip packaging and cooling cost. In addition, the higher temperature increases the probability of the invalidation in integrated circuits (ICs) and leads to the decrease of the system reliability. For large-scale parallel systems, high power consumption consumes the huge energy resource. Although the progress of multi-core processors partly alleviates the conflict between the performance and power, multi-core processors will bring about the large heat and cooling problem. From the study for low power in recent years, multi-core based software low-power research has become the focus.
     A large number of novel low-power techniques at different levels including circuit, logic, architecture and software levels have been proposed to reduce power and energy. This thesis aims at study of low-power software optimization technology. Based on traditional performance-oriented optimization, we use dynamic voltage scheduling and shutting down technology to reduce power consumption while meeting the performance demand. In details, the major work consists of the following aspects: (1) Analyzing several low-power techniques and the power characteristics of on-chip memory; (2) Studying the enegy optimization method combining the processor with main memory. Two problems are considered: one is energy optimization with performance constraint and the other is energy-constrained performance optimization; (3) Studying low-power software prefetching optimization method, which combines static voltage scheduling with dynamic voltage scheduling; (4) Studying multi-core based energy optimization method for parallel programs. This method effectively applies shutting down technique and dynamic voltage scheduling on multi-core based energy optimization for parallel programs. The main contributions of this thesis are as follows:
     1. Proposing an energy-optimal energy optimization model based on processor and main memory, which remedies the deficiency that optimizes only the performance but the energy consumption. This model describes all kinds of situations appearing in the software prefetching. This model minimizes the energy consumption of the system by simultaneously adjusting processor's voltage and frequency and main memory's voltage and frequency.
     2. Proposing an energy-constrained energy model based on processor and mainmemory, which resolves the performance optimization under the limited energy supply. This model describes all kinds of situations appearing in the software prefetching. This model minimizes the execution time of the system within a limited energy constraint by simultaneously adjusting processor's voltage and frequency and main memory's voltage and frequency.
     3. Proposing static voltage scheduling based low-power software prefetching method.One side this method reduces the power dissipation by adjusting the voltage, and on the other side it achieves the further performance and power improvement by adjusting the prefetch distance. At last the average power dissipation of the whole program is no more than when no prefetch. That is, this method can improve the performance while eliminating the power increase due to software prefetching.
     4. Proposing dynamic voltage scheduling based low-power software prefetchingalgorithm PDP-DVS. This algorithm adjusts the processor's voltage level and guarantees the average power dissipation under the level when no prefetch by on-time monitoring the power of the system. Simulation results show that PDP-DVS can achieve the effective performance improvement without the power increase.
     5. Proposing the multi-core based energy optimization method for parallel programs.One hand this method uses shutting down technology to adjust the number of processors during the serial program execution, and on the other hand it uses dynamic voltage scheduling to adjust the voltage and frequency of each processor during the parallel program execution. Simulation results show this method can effectively reduce the energy consumption of parallel programs.
引文
[1]M. Wolfe. High Performance Compilers for Parallel Computing, Addison-Wesley Publishing Company. 1996.
    
    [2]M. Valluri and L. John. Is compiling for performance = compiling for power? In Proceedings of the 5th Annual Workshop on Interaction between Compilers and Computer Architectures (INTERACTS). January 2001.
    
    [3]Vivek Tiwari, Sharad Malik and Andrew Wolfe. Compilation Techniques for Low Energy: An Overview. In Proceedings of the 1994 Symposium on Low-Power Electronics. San Diego, CA. October 1994.
    
    [4]Kaushik Roy and Mark C. Johnson. Software Design for Low Power. In NATO Advanced Study Institute on Low Power Design in Deep Submicron Electronics.NATO ASI Series. 1996.
    
    [5]V. Tiwari, S. Malik and A. Wolfe. Power analysis of embedded software: A first step towards software power minimization. IEEE Transactions on VLSI Systems,2(4). December 1994.
    
    [6]J. Pouwelse, K. Langendoen and H. Sips. Dynamic voltage scaling on a low power microprocessor. In Proceedings of the 7th Annual International Conference on Mobile Computing and Networking. p.251 - 259. July 2001.
    
    [7]E. A. Lahiri K., Dey S. and Panigrahi D. Battery-Driven System Design: A New Frontier in Low Power Design. In Proceedings of Asia and South Pacific Design Automation Conference (ASPDAC-02). Bangalore, India. IEEE CS.2002.
    
    [8]Hongbo Yang. Power-aware Compilation Techniques for High Performance Processors. Doctor dissertation. University of Delaware. Winter 2004.
    
    [9]V. Tiwari, D. Singh, S. Rajgopal, G. Mehta, R. Patel and F. Baez. Reducing power in high-performance microprocessors. In Proceedings of the 35th Conference on Design Automation. June 1998.
    
    [10]T. Mudge. Power: A First Class Design Constraint for Future Architectures. IEEE Computer, 34(4). p. 52-58. April 2001.
    
    [1 l]Borkar S. Low power design challenges for the decade (invited talk). In Proceedings of the 2001 conference on Asia South Pacific design automation.Yokohama, Japan. ACM Press. p.293-296.2001.
    
    [ 12]Chandrakasan A. P. and Brodersen R. W. Minimizing power consumption in digital CMOS circuits. In Proceedings of IEEE. p.498-523.1995.
    
    [13]Thompson S., Packan P. and Bohr M. MOS Scaling: Transistor Challenges for the 21 st Century. Intel Technology Journal, Q3, .1998.
    
    [14]Ge R., Feng X. and Cameron K. W. Performance-constrained Distributed DVS Scheduling for Scientific Applications on Power-aware Clusters. In Proceeding of the 2005 ACM/IEEE conference on Super computing. Seattle, WA,USA. IEEE Computer Society. March 2005.
    [15]Feng X.,Ge R.and Cameron K.W.Power and Energy Profiling of Scientific Applications on Distributed Systems.In Proceedings of 19th International Parallel and Distributed Processing Symposium(IPDPS 2005).Denver,CA,USA.IEEE Computer Society.April 4-8 2005.
    [16]S.Kougia,A.Chatzigeorgiou and S.Nikolaidis.Power Reduction for Multimedia Applications through Data-reuse Memory Exploration.In Proceedings of IEEE International Conference on Electronics,Circuits and Systems(ICECS-01).Malta.September 2001.
    [17]C.Hughes,J.Srinivasan and S.Adve.Saving energy with architectural and frequency adaptations for multimedia applications.In Proceedings of the 34th Annual International Symposium on Microarchitecture(MICRO-01).2001.
    [18]Ruchira Sasanka,Sarita V.Adve,Yen-Kuang Chen and Eric Debes.Comparing the Energy Efficiency of CMP and SMT Architectures for Multimedia Workloads.UIUC CS Technical Report UIUCDCS-R-2003-2325.March 2003.
    [19]Chung-Hsing Hsu.Compiler-Directed Dynamic Voltage and Frequency Scaling for CPU Power and Energy Reduction.Doctor dissertation.Rutgers,The State University of New Jersey.New Brunswick,New Jersey.October 2003.
    [20]赵荣彩,唐志敏,张兆庆,Guang R.Gao.软件流水的低功耗编译技术研究.斩件学报,14(8).p.1357-1363.2003.
    [21]International Technology Roadmap for Semiconductors,2005 Edition.ITRS,May 2006.http://public.itrs.net.
    [22]新闻观察.英特尔承认遭遇芯片发热技术壁垒.新浪科技.2004年05月18日.
    [23]Intel Multicore Fact Sheet.http://www.intel.com/pressroom/kits/core2duo/pdf/intel_multicore_fact_sheet.pdf.
    [24]AMD Multi-Core Processors.http://www.via.com.tw/en/downloads/presentations/events/vtf2005/vtf05_hdc_amd .pdf.
    [25]Kim N.S.,Austin T.and Blaauw D.et al.Leakage Current:Moore's Law Meets Static Power.IEEE Computer,36(12).p.65-77.2003.
    [26]Kim N.S.,Blaauw D.and Mudge T.Leakage Power Optimization Techniques for Ultra Deep Sub-Micron Multi-Level Caches.In Proceedings of 2003International Conference on Computer-Aided Design(ICCAD'03).San Jose,California,USA.IEEE Computer Society.p.627-632.November 11-13 2003.
    [27]拉贝,钱德拉卡山,尼科利奇.数字集成电路.设计透视(第2版),北京:清华大学出版社.2004.
    [28]Hsu C.and Feng W.A Power-Aware Run-Time System for High-Performance Computing.In Proceedings of the ACM/IEEE SC'2005 Conference on High Performance Networking and Computing.Seattle,WA,USA.IEEE CS.March 2005.
    [29]易会战.低功耗技术研究.体系结构和编译优化.博士学位论文.国防科学技术大学.计算机学院.长沙.2006.
    [30]R.Rajamony and R.Bianchini.Energy management for server clusters.In Tutorial,16th Annual ACM International Conference on Supercomputing.June 2002.
    [31]Osman S.Unsal and Israel Koren.System-level power-aware design techniques in real-time systems.Proceedings of the IEEE,91(7).July 2003.
    [32]P.Huber and M.Mills.Dig more coal:The pcs are coming.Forbes Magazine.May 1999.
    [33]E.A.Kawamoto K.,Koomey J.H.G.and Dman B.N.Electricity Used by Office Equipment and Network Equipment in the U.S.Lawrence Berkeley National Lab,Berkeley CA.Feb 2001.
    [34]骆祖莹.芯片功耗与摩尔定律的终结.技术报告.清华大学计算机系EDA实验室..
    [35]Lorch J.R.A Complete Picture of the Energy Consumption of a Portable Computer.Masters Thesis.University of California at Berkeley.Computer Science.December 1995.
    [36]Tiwari V.,Singh D.and Rajgopal S.et al.Reducing Power in High-performance Microprocessors.In Proceedings of the 35th annual conference on Design automation.San Francisco,CA USA.ACM Press.p.732-737.1998.
    [37]Gowan M.K.,Biro L.L.and Jackson D.B.Power Considerations in the Design of the Alpha 21264 Microprocessor.In Proceedings of the 35th annual conference on Design automation.San Francisco,CA USA.ACM Press.p.726-731.1998.
    [38]Unsal O.S.,Ashok R.and Koren I.et al.ool-Cache:A compiler-enabled energy efficient data caching framework for embedded/multimedia processors.ACM Transactions on Embedded Computing Systems(TECS),Special issue on power-aware embedded computing,2(3).p.373-392.2003.
    [39]Jung S.,Kim K.and Kang S.Low-Swing Clock Domino Logic Incorporating Dual Supply and Dual Threshold Voltages.In Proceedings of the 39th conference on Design automation.New Orleans,Louisiana,USA.ACM Press.p.467-472.2002.
    [40]Amelifard B.,Fallah F.and Pedram M.Low-Power Fanout Optimization Using Multiple Threshold Voltage Inverters.In Proceedings of the 2005 International Symposium on Low Power Electronics and Design.San Diego,California,USA.ACM Press.p.95-98.August 8-10 2005.
    [41]Calhoun B.H.and Chandrakasan A.Characterizing and Modeling Minimum Energy Operation for Subthreshold Circuits.In Proceedings of International Symposium on Low Power Electronics and Design 2004.Newport Beach,California,USA.ACM Press.p.90-95.August 9-11 2004.
    [42]Gandhi K.R.and Mahapatra N.R.A Detailed Study of Hardware Techniques that Dynamically Exploit Frequent Operands to Reduce Power Consumption in Integer Function Units.In Proceedings of Second Annual Workshop on Duplicating,Deconstructing and Debunking.San Diego,California.June 8 2003.
    [43]Donno M.,Ivaldi A.and Benini L.et al.Clock-Tree Power Optimization based on RTL Clock-Gating.In Proceedings of the 40th conference on Design automation.Anaheim,California,USA.ACM Press.p.622-627.June 2-6 2003.
    [44]Heydari P.and Pedram M.Interconnect Energy Dissipation in High-Speed ULSI Circuits.In Proceedings of ASP-DAC/VLSI Design 2002.Bangalore,India.IEEE Computer Society.p.132-140.January 2002.
    [45]Kapur P.,Chandra G.and Saraswat K.C.Power Estimation in Global Interconnects and its Reduction Using a Novel Repeater Optimization Methodology.In Proceedings of the 39th conference on Design automation.New Orleans,Louisiana,USA.ACM Press.p.461-466.June 2002.
    [46]Wason V.and Banerjee K.A Probabilistic Framework for Power-Optimal Repeater Insertion in Global Interconnects under Parameter Variations.In Proceedings of the 2005 international symposium on Low power electronics and design.San Diego,California,USA.ACM Press.p.131-136.August 8-10 2005.
    [47]Ananthan H.,Kim C.H.and Roy K.Larger-than-Vdd Forward Body Bias in Sub-0.5V Nanoscale CMOS.In Proceedings of the 2004 international symposium on Low power electronics and design.Newport Beach,California,USA.ACM Press.p.8-13.August 9-11 2004.
    [48]Rao R.M.,Burns J.L.and Devgan A.Efficient Techniques for Gate Leakage Estimation.In Proceedings of the 2003 international symposium on Low power electronics and design.Seoul,Korea.ACM Press.p.100-103.August 25-27 2003.
    [49]Piguet C.,Renaudin M.and Omnes T.J.Special Session on Low-Power Systems on Chips(SOCs).In Proceedings of Design,Automation,and Test in Europe(DATE '01).Paris,France.IEEE Computer Society.February 2001.
    [50]Powell M.D.,Schuchman E.and Vijaykumar T.N.Balancing Resource Utilization to Mitigate Power Density in Processor Pipelines.In Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture.Barcelona,Spain.IEEE Computer Society.p.294-304.November 12-16 2005.
    [51]Ku J.C.,Ozdemir S.and Memik G.et al.Thermal Management of On-Chip Caches Through Power Density Minimization.In Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture.IEEE Computer Society.p.283-293.November 12-16 2005.
    [52]Chung-Hsing Hsu and U.Kremer.The Design,Implementation,and Evaluation of a Compiler Algorithm for CPU Energy Reduction.In ACMSIGPLAN Conference on Programming Languages,Design,and Implementation(PLDI'03).San Diego,CA.June 2003.
    [53]M.Kandemir,N.Vijaykrishnan,M.J.Irwin,W.Ye and I.Demirkiran.Register Relabeling:A Post Compilation Technique for Energy Reduction.In Proceedings of Workshop on Compilers and Operating Systems for Low Power(COLP'00).Philadelphia,PA.October 2000.
    [54]U.Kremer,J.Hicks and J.Rehg.A Compilation Framework for Power and Energy Management on Mobile Computers.In Proceedings of International Workshop on Languages and Compilers for Parallel Computing(LCPC'01).Cumberland,KT.August 2001.
    [55]赵荣彩.多线程低功耗编译优化技术研究.博士论文.中国科学院计算技术研究所.Oct.2002.
    [56]Ultra Low Power Technology Group. http://www-star.stanford.edu/projects/ulp/.
    [57]PARAPET RESEARCH GROUP. http://parapet.ee.princeton.edu/.
    
    [58]Microsystems Design Laboratory.http://mdlwiki.cse.psu.edu/twiki/bin/view/MDL/WebHome.
    
    [59]Project--Compiler Optimizations for Power Aware Computing.http://codesign.ece.gatech.edu/projects/pac/.
    
    [60]Suresh D. C., Agrawal B., Yang J. and et al. Power Efficient Encoding Techniques for Off-chip Data. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES-03). ACM Press. p.267-275. October 30 2003.
    
    [61]Gupta J. R. C. Z. Frequent Value Encoding for Low Power Data Buses. ACM Transactions on Design Automation of Electronic Systems, 9(3). p.354-384.2004.
    
    [62] Cheng W. and Pedram M. Low Power Techniques for Address Encoding and Memory Allocation. In Proceedings of the 2001 conference on Asia South Pacific design automation. Yokohama, Japan. ACM Press. p.245-250. January 30 2001.
    
    [63]Zhang C. and Vahid F. A Power-Configurable Bus for Embedded Systems. In Proceedings of IEEE International Symposium on Circuits and Systems. IEEE CS.p.809-812. May 2002.
    
    [64]Basu K., Choudhary A., Pisharath J. and et al. Power Protocol: Reducing Power Dissipation on Off-Chip Data Buses. In Proceedings of the 35th Annual International Symposium on Microarchitecture. Istanbul, Turkey. ACM Press.p.345-355. November 18-22 2002.
    
    [65]Li H., Bhunia S., Chen Y. and et al. Deterministic Clock Gating for Microprocessor Power Reduction. In Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA-03). Anaheim,California, USA. IEEE CS. February 8-12 2003.
    
    [66]Wu Q., Pedram M. and Wu X. Clock-Gating and Its Application to Low Power Design of Sequential Circuits. IEEE Trans on Circuits and Systems I:Fundamental Theory and Applications, 47(3). p. 415-420.2000.
    
    [67]Emnett F. and Biegel M. Power Reduction Through RTL Clock Gating. Tech Report. Automotive Integrated Electronics Corporation, November 2000.
    
    [68]Luo Y., Yu J., Yang J. and et al. Low Power Network Processor Design Using Clock Gating. In Proceedings of the 42nd Design Automation Conference (DAC-05). Anaheim, California, USA. ACM Press. p.712-715. June 13-17 2005.
    
    [69]Tang W., Gupta R. and Nicolau A. Power Savings in Embedded Processors through Decode Filter Cache. In Proceedings of Design, Automation and Test in Europe Conference and Exposition (DATE-02). Paris, France. IEEE CS. p.443-448.March 4-8 2002.
    
    [70]Zhang C., Yang J. and Vahid F. Low Static-Power Frequent-Value Data Caches. In Proceedings of 2004 Design, Automation and Test in Europe Conference and Exposition (DATE-04). Paris, France. IEEE CS. p.214-219. Feb.16-20 2004.
    [71]Hu J.S.,Vijaykrishnan N.,Kim S.and et al.Scheduling Reusable Instructions for Power Reduction.In Proceedings of the conference on Design,automation and test in Europe - Volume 1.Paris,France.IEEE CS.p.148-155.Feb.16-20 2004.
    [72]Yang J.and Gupta R.Energy Efficient Frequent Value Data Cache Design.In Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture.Istanbul,Turkey.ACM/IEEE.p.197-207.November 18-222002.
    [73]Macii A.,Macii E.and Poncino M.Improving the Efficiency of Memory Partitioning by Address Clustering.In Proceedings of 2003 Design,Automation and Test in Europe Conference and Exposition(DATE-03).Munich,Germany.IEEE CS.p.10018-10023.March 3-7 2003.
    [74]Zyuban V.V.Inherently Lower-Power High-Performance Superscalar Architectures.Notre Dame,Indiana,Department of Computer Science and Engineering.2000.
    [75]Tseng J.H.and KrsteAsanovi'c.Banked Multiported Register Files for High-Frequency Superscalar Microprocessors.In Proceedings of 30th International Symposium on Computer Architecture(ISCA-30).IEEE CS.p.62-71.June 9-11 2003.
    [76]Ashok R.,Chheda S.and Moritz C.A.Cool-Mem:Combining Statically Speculative Memory Accessing with Selective Address Translation for Energy Efficiency.In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS-X).San Jose,California.ACM Press.p.133-143.October 2002.
    [77]Kucuk G.,Ghose K.,Ponomarev D.V.and et al.Energy-Efficient Instruction Dispatch Buffer Design for Superscalar Processors.In Proceedings of the 2001International Symposium on Low Power Electronics and Design.Huntington Beach,California,USA.ACM Press.p.237-242.August 2001.
    [78]V.Delaluz,M.Kandemir,N.Vijaykrishnan,A.Sivasubramiam and M.J.Irwin.DRAM energy management using software and hardware directed power mode control.In Proceedings of the 7th International Syrup.on High Performance Computer Architecture.Nuevo Leone,Mexico.IEEE CS.p.159-170.January 20-24 2001.
    [79]Balasubramonian R.,Dwarkadas S.and Albonesi D.H.Reducing the Complexity of the Register File in Dynamic Superscalar Processors.In Proceedings of the 34th Annual International Symposium on Microarchitecture.Austin,Texas,USA.ACM/IEEE.p.237-248.December 1-5 2001.
    [80]Park I.,Powell M.D.and Vijaykumar T.N.Reducing Register Ports for Higher Speed and Lower Energy.In Proceedings of the 35th Annual International Symposium on Microarchitecture.Istanbul,Turkey.ACM/IEEE.p.171-182.November 18-22 2002.
    [81]Zhang C.,Vahid F.and Najjar W.A Highly Configurable Cache Architecture for Embedded Systems.In Proceedings of 30th International Symposium on Computer Architecture(ISCA-03).San Diego,California,USA.IEEE CS.p.136-146.June 9-11 2003.
    [82]Wang H., Peh L. and Malik S. Power-driven Design of Router Microarchitectures in On-chip Networks. In Proceedings of the 36th Annual International Symposium on Microarchitecture. San Diego, CA, USA. ACM/IEEE.p.105-116. December 3-5 2003.
    
    [83]Pisharath J., Choudhary A. and Kandemir M. Reducing Energy Consumption of Queries in Memory-Resident Database Systems. In Proceedings of the 2004 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. Washington DC, USA. ACM Press. p.35-45. September 22-25 2004.
    
    [84]Albonesi D. H. Selective Cache Ways: On-Demand Cache Resource Allocation.In Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture. Haifa, Israel. ACM/IEEE. November 16-18 1999.
    
    [85]Witchel E. and KrsteAsanovi'c. The Span Cache: Software Controlled Tag Checks and Cache Line Size. In Proceedings of Workshop on Complexity-Effective Design, 28th ISCA. Gotenborg, Sweeden. IEEE CS. p.1-12. July 2001.
    
    [86]Witchel E., Larsen S., Ananian C. S. and et al. Direct Addressed Caches for Reduced Power Consumption. In Proceedings of 34th Annual International Symposium on Microarchitecture (MICRO-01). Austin, Texas. ACM/IEEE.p.124-133.December 2001.
    
    [87]T. Burd, T. Pering, A. Stratakos and R. Broderson. A dynamic voltage scaled microprocessor system. IEEE Journal of Solid-State Circuits, 35(11). p. 1571 -1580. November 2000.
    
    [88]Padmanabhan Pillai and Kang G. Shin. Real-time dynamic voltage scaling for low-power embedded operating systems. In Proceedings of the 18th Symposium on Operating Systems Principles. October 2001.
    
    [89]L. Clark, E. Hoffman, M. Schaecher, M. Biyani, D. Roberts and Y. Liao. A scalable performance 32b microprocessor. In Proceedings of International Solid-State Circuits Conference (ISSCC-01). February 2001.
    
    [90]K. Nowka, G. Carpenter, E. Mac Donald, H. Ngo, B. Brock, K. Ishii, T. Nguyen and J. Burns. A 0.9V to 1.95V Dynamic Voltage-Scalable and Frequency-Scalable 32b PowerPC Processors. In Proceedings of International Solid-State Circuits Conference (ISSCC-02). IEEE Press. p.340-341.2002.
    
    [91 ]Fleischmann M. LongRun Power Management: Dynamic Power Management for Crusoe Processors. Tech Report. Transmeta Corporation, January 17, 2001.
    
    [92]Paper I. W. Enhanced Intel SpeedStep Technology for the Intel Pentium M Processor. Tech Report. Order Number: 301170-001. March 2004.
    
    [93]A. Chandrakasan, S. Sheng and R. Brodersen. Low power CMOS digital design.IEEE Journal of Solid-State Circuits, 27(4). p. 473 - 484. April 1992.
    
    [94]V. Kaenel, P. Macken and M. Degrauwe . A voltage reduction technique for battery-operated systems. IEEE Journal of Solid-State Circuits, 25(5). 1990.
    
    [95]L. Nielsen, C. Niessen, J. Sparso and C. Berkel. Low-power operation using self-timed circuits and adaptive scaling of the supply voltage. IEEE Transactions on Very Large Scale Integration (VLSI) System, 2(4). 1994.
    
    [96]Crusoe Processor Model TM5700/TM5900 Data Book._______________________ http://www.transmeta.com/crusoe_docs/tm5900_databook_040204.pdf.
    
    [97]Intel XScale (tm) Core Developer's Manual. 2002.http://developer.intel.com/design/intelxscale/.
    
    [98]Smith S. F. A Multiple-Clock-Domain Bus Architecture Using Asynchronous FIFOS as Elastic Elements. Ph.D Thesis. University of Idaho. College of Graduate Studies. October 2003.
    
    [99]Semeraro G., Magklis G., Balasubramonian R. and et al. Energy-Efficient Processor Design Using Multiple Clock Domains with Dynamic Voltage and Frequency Scaling. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture (HPCA-02). Boston, Massachusetts,USA. IEEE CS. p.29-42. February 2-6 2002.
    
    [100]Iyer A. and Marculescu D. Power and Performance Evaluation of Globally Asynchronous Locally Synchronous Processors. In Proceedings of 29th International Symposium on Computer Architecture (ISCA-02). Anchorage, AK,USA. IEEE CS. May 25-29 2002.
    
    [101]Oliver J., Rao R., Sultana P. and et al. Synchroscalar: A Multiple Clock Domain, Power-Aware, Tile-Based Embedded Processor. In Proceedings of 31st International Symposium on Computer Architecture (ISCA-04). Munich, Germany.IEEE CS. p.150-161. June 19-23 2004.
    
    [102]Magklis G., Scott M. L., Semeraro G. and et al. Profile-based Dynamic Voltage and Frequency Scaling for a Multiple Clock Domain Microprocessor. In Proceedings of 30th International Symposium on Computer Architecture (ISCA-03). San Diego, California, USA. IEEE CS. p.14-25. June 9-11 2003.
    
    [103]Sasaki H., Kondo M. and Nakamura H. Dynamic Instruction Cascading on GALS Microprocessors. In Proceedings of Integrated Circuit and System Design,Power and Timing Modeling, Optimization and Simulation, 15th International Workshop (PATMOS-05). Leuven, Belgium. Springer. p.30-39. September 21-23 2005.
    
    [104]Semeraro G. P. Multiple Clock Domain Microarchitecture Design and Analysis.Ph.D Thesis. University of Rochester.
    
    [105]Choi K., Soma R. and Pedram M. Dynamic Voltage and Frequency Scaling based on Workload Decomposition. In Proceedings of the 2004 international symposium on Low power electronics and design. Newport Beach, California,USA. ACM Press. p.174-179. August 9-11 2004.
    
    [106]Halter J. P. and Najm F. N. A Gate-Level Leakage Power Reduction Method for Ultra-Low-Power CMOS Circuits. In Proceedings of IEEE Custom Integrated Circuits Conference. p.475-478.1997.
    
    [107]Johnson M. C., Somasekhar D. and Roy K. Leakage Control With Efficient Use of Transistor Stacks in Single Threshold CMOS. In Proceedings of the 36th ACM/IEEE conference on Design automation. New Orleans, Louisiana, United States. ACM Press. p.442-445.1999.
    
    [108]Abdollahi A., Fallah F. and Pedram M. Runtime Mechanisms for Leakage Current Reduction in CMOS VLSI Circuits. In Proceedings of the 2002 international symposium on Low power electronics and design. Monterey, California,USA.ACM Press.p.213-218.2002.
    [109]Kuroda T.,Fujita T.,Mita S.and et al.A 0.9V 150MHz 10mW 4mm2 2-D Discrete Cosine Transform Core Processor with Variable-Threshold-Voltage Scheme.In Proceedings of 4th International Workshop on Advanced LSI's.Korea.p.150-158.July 18-20 1996.
    [110]Dropsho S.,Kursun V.,Albonesi D.H.and et al.Managing Static Leakage Energy in Microprocessor Functional Units.In Proceedings of the 35th Annual International Symposium on Microarchitecture.Istanbul,Turkey.ACM/IEEE.p.321-332.November 18-22 2002.
    [111]Li L.,KadayifI.,Tsai Y.and et al.Leakage Energy Management in Cache Hierarchies.In Proceedings of 2002 International Conference on Parallel Architectures and Compilation Techniques(PACT-02).Charlottesville,VA,USA.IEEE CS.p.131-140.September 22-25 2002.
    [112]Hu Z.,Buyuktosunoglu A.,Srinivasan V.and et al.Microarchitectural Techniques for Power Gating of Execution Units.In Proceedings of the 2004International Symposium on Low Power Electronics and Design.Newport Beach,California,USA.ACM Press.p.32-37.August 9-11 2004.
    [113]Yang S.,Powell M.D.,Falsafi B.and et al.An Integrated Circuit/Architecture Approach to Reducing Leakage in Deep-Submicron High-Performance I-Caches.In Proceedings of the Seventh International Symposium on High-Performance Computer Architecture(HPCA-01).Nuevo Leone,Mexico.IEEE CS.p.147-158.January 20-24 2001.
    [114]KrisztiánFlautner,Kim N.S.,Martin S.and et al.Drowsy Caches:Simple Techniques for Reducing Leakage Power.In Proceedings of 29th International Symposium on Computer Architecture(ISCA-02).Anchorage,AK,USA.IEEE CS.p.148-157.May 25-29 2002.
    [115]Kim N.S.,KrisztiánFlautner,Blaauw D.and et al.Drowsy Instruction:Caches Leakage Power Reduction using Dynamic Voltage Scaling and Cache Sub-bank Prediction.In Proceedings of the 35th Annual International Symposium on Microarchitecture.Istanbul,Turkey.ACM/IEEE.p.219-230.November 18-222002.
    [116]Kim N.S.,Flautner K.,Blaauw D.and et al.Circuit and Microarchitectural Techniques for Reducing Cache Leakage Power.IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION(VLSI) SYSTEMS,12(2).p.167-184.2004.
    [117]Duarte D.,Tsai Y.,Vijaykrishnan N.and et al.Evaluating Run-Time Techniques for Leakage Power Reduction.In Proc.of ASP-DAC/VLSI Design 2002.Bangalore,India.IEEE online,p.31-38.January 7-11 2002.
    [118]V.Tiwari,S.Malik,A.Wolfe and M.Lee.Instruction level power analysis and optimization of software.Journal of VLSI Signal Processing,13(2/3).p.1-18.1996.
    [119]M.Lee,V.Taiwari,S.Malik and M.Fujita.Power analysis and minimization techniques for embedded DSP software.IEEE Transactions on VLSI Systems,5(1).March 1997.
    [120]B. Klass, D. Thmoas, H. Schmit and D. Nagle. Modeling inter-instruction energy effects in a digital signal processor. In Proceedings of the Power-Driven Microarchitecture Workshop. June 1998.
    
    [121]A. Sinha and A. Chandrakasan. JouleTrack - a web based tool for software energy profiling. In Proceedings of the 37th Conference on Design Automation (DAC-00).June 2000.
    
    [122]Stefan Steinke, Markus Knauer, Lars Wehmeyer and Peter Marwedel. An accurate and fine grain instruction-level energy model supporting software optimizations. In Proceedings of International Workshop on Power and Timing Modeling,Optimization and Simulation (PATMOS-01). September 2001.
    
    [123]G. Qu, N. Kawabe, K. Usami and M. Potkonjak. Function-level power estimation methodology for microprocessors. In Proceedings of Design Automation Conference (DAC-00). June 2000.
    
    [124]C. Brandolese, W. Fornaciari, F. Salice and D. Sciuto. An instruction-level functionality-based energy estimation model for 32-bits microprocessors. In Proceedings of the 37th IEEE-Design Automation Conference (DAC-00). p.346 -351.June 2000.
    
    [125]G. Sinevriotis and T. Stouraitis. Power analysis of the ARM7 embedded microprocessor. In Proceedings of International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS-99). October 6-8 1999.
    
    [126]Landman P. E. High-Level Power Estimation. In Proceedings of the 1996 International Symposium on Low Power Electronics and Design. IEEE Computer Society. p.29-35. August 12-14 1996.
    
    [127]Kim N. S., Austin T. and Mudge T. et al. Challenges for Architectural Level Power Modeling. Power aware computing. Norwell, MA, USA, Kluwer Academic Publishers. p. 317-337.2002.
    
    [128]Cai G. and Lim C. H. Architectural level power/performance optimization and dynamic power estimation. In Proceedings of CoolChips Tutorial colocated with MICRO32. Haifa, Israel. November 16-18 1999.
    
    [129]Dhodapkar A., Lim C., Cai G. and et al. TEM2P2EST: A Thermal Enabled Multi-Model Power/Performance ESTimator. In Proceedings of the First International Workshop on Power-Aware Computer Systems. p. 112-125.2000.
    
    [130]http://eda.ee.ucla.edu/PowerImpact/.
    
    [131]W. Ye, N. Vijaykrishna, M. Kandemir and M. J. Irwin. The design and use of SimplePower: A cycle-accurate energy estimation tool. In Proceedings of Design Automation Conference (DAC). June 2000.
    
    [132]D. Brooks, V. Tiwari and M. Martonosi. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of 27th International Symposium on Computer Architecture (ISCA). p.83-94. June 2000.
    
    [133]D. Ponomarev, G. Kucuk and K. Ghose . AccuPower: An accurate power estimation tool for superscalar microprocessors. In Proceedings of Design,Automation and Test in Europe Conference. March 2002.
    
    [134]Project T. S. P. M. Sim-Panalyzer2.0_Reference Manual. Tech Report. University of Michigan,the University of Colorado,2004.
    [135]Shivakumar P.and Jouppi N.P.CACTI 3.0:An Integrated Cache Timing,Power,and Area Model.Tech Report.WRL-2001-2.DEC Corporation,2001.
    [136]Zhang Y.,Parikh D.,Sankaranarayanan K.and et al.Hotleakage:A temperature-aware model of subthreshold and gate leakage for architects.Tech Report.CS-2003-05.Department of Computer Science,University of Virginia,2003.
    [137]Skadron K.,Stan M.R.,Huang W.and et al.Temperature-Aware Microarchitecture.In Proceedings of 30th International Symposium on Computer Architecture(1SCA-03).San Diego,California,USA.IEEE Computer Society.p.2-13.June 9-11 2003.
    [138]Wang H.,Zhu X.,Peh L.and et al.Orion:A Power-Performance Simulator for Interconnection Networks.In Proceedings of the 35th Annual International Symposium on Microarchitecture.Istanbul,Turkey.ACM/IEEE.p.294-305.November 18-22 2002.
    [139]S.Gurumurthi,A.Sivasubramaniam,M.J.Irwin,N.Vijaykrishnan,M.Kandemir,T.Li and L.K.John.Using complete machine simulation for software power estimation:The SoftWatt approach.In Proceedings of the 8th International Symposium on High Performance Computer Architecture.February 2002.
    [140]Chen J.,Dubois M.and Stenstrm P.Integrating Complete-System and User-level Performance/Power Simulators:The Sim Wattch Approach.In Proceedings of 2003 IEEE International Symposium on Performance Analysis of Systems and Software(ISPASS-03).Austin,Texas,USA.IEEE Computer Society.p.1-10.March 6-8 2003.
    [141]Contreras G.,Martonosi M.,Peng J.and et al.XTREM:A Power Simulator for the Intel XScaler Core.In Proceedings of the 2004 ACM SIGPLAN/SIGBED Conference on Languages,Compilers,and Tools for Embedded Systems (LCTES-04).Washington,DC,USA.ACM Press.p.115-125.June 11-13 2004.
    [142]王永文.高性能微处理器体系结构级功耗估算与优化技术研究.博士论文.国防科学技术大学研究生院.二零零四年十月.
    [143]Minh D.Q.,Bengtsson L.and Larsson-edefors P.DSP-PP:A Simulator/Estimator of Power Consumption and Performance for Parallel DSP Architectures.In Proceedings of The 21st IASTED International Multi-Conference on Applied Informatics(AI-03).Innsbruck,Austria.IASTED/ACTA Press.p.767-772.February 10-13 2003.
    [144]T.Austin,E.Larson and D.Ernst.S impleScalar:An infrastructure for computer system modeling.IEEE Computer,35(2).2002.
    [145]PowerAnalyzer:The SimpleScalar-Arm Power Modeling Project.http://www.eecs.umich.edy/jringenb/power/.
    [146]J.Flinn and M.Satyabarayanan.PowerScope:A tool for profiling the energy usage of mobile applications.In Proceedings of the 2nd IEEE Workshop on Mobile Computing Systems and Applications.February 1999.
    [147]F.Chang,K.Farkas and P.Ranganathan.Energy-driven statistical profiling: Detecting software hotspots.In Proceedings of Workshop on Power Aware Computing Systems(PACS-02).February 2002.
    [148]L.Benini and G.Micheli.System-level power optimization:Techniques and tools.ACM Transactions on Design Automation of Electronic Systems,5.p.115-192.April 2000.
    [149]M.Kandemir,N.Vijaykrishnan,M.J.Irwin and W.Ye.Influence of compiler optimizations on system power.In Proceedings of Design Automation Conference (DAC).June 2000.
    [150]V.Delaluz,M.Kandemir,N.Vijaykrishnan and M.J.Irwin.Energy-Oriented Compiler Optimizations for Partitioned Memory Architectures.In Proceedings of CASES.2000.
    [151]Hongbo Yang,G.Gao,A.Marquez,G.Cai and Z.Hu.Power and energy impact by loop transformations.In Proceedings of the Workshop on Compilers and Operating Systems for Low Power(COLP-01).September 2001.
    [152]J.Seng and D.Tullsen.the effect of compiler optimizations on pentium 4 power consumption.In Proceedings of the 7th Annual Workshop on Interaction between Compilers and Computer Architectures.February 2003.
    [153]M.Kandemir,N.Vijaykrishnan,M.J.Irwin and H.S.Kim.Experimental evaluation of energy behavior of iteration space tiling.In Proceedings of International Workshop on Languages and Compilers for Parallel Computing (LCPC-00).August 2000.
    [154]T.Burd and R.Brodersen.Design issues for dynamic voltage scaling.In Proceedings of 2000 International Symposium on Low Power Electronics and Design(ISLPED' 00).July 2000.
    [155]M.Weiser,B.Welch,A.Demers and S.Shenker.Scheduling for reduced CPU energy.In Proceedings of the 1st Symposium on Operating Systems Design and Implementation(0SDI-94).p.13-23.November 1994.
    [156]Jacob Rubin Lorch.Operating Systems Techniques for Reducing Processor Energy Consumption.Ph.D.thesis.UNIVERSITY of CALIFORNIA,BERKELEY.2001.
    [157]D.Mosse,H.Aydin,B.Childers and R.Melhem.Compiler-assisted dynamic power-aware scheduling for real-time applications.In Workshop on Compiler and Operating Systems for Low Power(COLP'00).October 2000.
    [158]R.Ernst and W.Ye.embedded program timing analysis based on path clustering and architecture classification.In Proceedings of Computer-Aided Design (ICCAD-97).p.598-604.1997.
    [159]T.Ishihara and H.Yasuura.Voltage scheduling problem for dynamically variable voltage processors.In International Symposium on Low Power Electronics and Design(ISLPED-98).p.197-202.August 1998.
    [160]A.Manzak and C.Chakrabarti.Variable voltage task scheduling for minimizing energy or minimizing power.In Proceedings of the International Conference on Acoustics,Speech and Signal Processing.June 2000.
    [161]V.Swaminathan and K.Chakrabarty.Investigating the effect of voltage switching on low-energy task scheduling in hard real-time systems.In Asia South Pacific Design Automation Conference(ASP-DAC' 01).2001.
    [162]K.Govil,E.Chan and H.Wasserman.Comparing algorithms for dynamic speedsetting of a low-power CPU.In the 1st ACM International Conference on Mobile Computing and Networking(MOBICOM-95).p.13-25.November 1995.
    [163]J.Lorch and A.Smith.Improving dynamic voltage algorithms with PACE.In Proceedings of the International Conference on Measurement and Modeling of Computer Systems(SIGMETRICS-01).June 2001.
    [164]T.Pering,T.Burd and R.Brodersen.The simulation and evaluation of dynamic voltage scaling algorithms.In Proceedings of 1998 International Symposium on Low Power Electronics and Design(ISLPED-98).p.76-81.August 1998.
    [165]D.Grunwald,P.Levis,K.Farkas,C.Morrey Ⅲ and M.Neufeld.Policies for dynamic clock scheduling.In Proceedings of the 4th Symposium on Operating System Design and Implementation(OSDI-00).October 2000.
    [166]A.Sinha and A.Chandrakasan.Dynamic voltage scheduling using adaptive filtering of workload traces.In Proceedings of the 14th International Conference on VLSI Design.January 2001.
    [167]H.Saputra,M.Kandemir,N.Vijaykrishnan,M.J.Irwin,J.Hu,C.-H Hsu and U.Kremer.Energy-conscious compilation based on voltage scaling.In ACM SIGPLAN Joint Conference on Languages,Compilers,and Tools for Embedded Systems and Software and Compilers for Embedded Systems(LCTES/SCOPES'02).June 2002.
    [168]C.Hsu,U.Kremer and M.Hsiao.Compiler-Directed Dynamic Voltage/Frequency Scheduling for Energy Reduction in Microprocessors.In Proceedings of International Syrup.on Low Power Electronics and Design(ISLPED-01).p.275-278.August 2001.
    [169]D.Shin and J.Kim.A profile-based energy-efficient intra-task voltage scheduling algorithm for hard real-time applications.In Proceedings of the International Symposium on Low-Power Electronics and Design(ISLPED'01).August 2001.
    [170]Fen Xie,Margaret Martonosi and Sharad Malik.Compile-Time Dynamic Voltage Scaling Settings:Opportunities and Limits.In Proceedings of ACMSIGPLAN 2006 Conference on Programming Language Design and Implementation (PLDI-03).San Diego,California,USA.ACM Press.p.49-62.June 9-11 2003.
    [171]S.Borkar.Design challenges for technology scaling.IEEE Micro,19(4).July-August 1999.
    [172]S.Gunther,F.Binns,D.M.Carmean and J.C.Hall.Managing the impact of increasing microprocessor power consumption.Intel Technology Journal,Q1,2001.
    [173]T.Agerwala.Computer architecture:Challenges and opportunities for the next decade.In Proceedings of International Symposium on Computer Architecture (Keynote presentation).M" unchen,Germany.June 2004.
    [174]U.Weiser.Microprocessors:Bypass the power wall.In Intel Academic Forum (Keynote presentation).Barcelona,Spain.April 2004.
    [175]E.Grochowski,R.Ronen,J.Shen and H.Wang.Best of both latency and throughput.In Proceedings of International Conference on Computer Design.San Jose,CA.p.236-243.October 2004.
    [176]I.Kadayif,M.Kandemir and U.Sezer.An Integer Linear Programming Based Approach for Parallelizing Applications in On-Chip Multiprocessors.In Proceedings of the 39th IEEE/ACM Design Automation Conference(DAC-02).New Orleans,LA,USA.p.703-708.June 10-14 2002.
    [177]I.Kadayif,M.Kandemir,N.Vijaykrishnan,M.J.Irwin and I.Kolcu.Exploiting Processor Workload Heterogeneity for Reducing Energy Consumption in Chip Multiprocessors.In Proceedings of Design,Automation and Test in Europe (DATE-04).Paris,France.p.1158-1163.Feb.2004.
    [178]Jian Li and Jose F.Martinez.Power-Performance Implications of Thread-level Parallelism on Chip Multiprocessors.In Proceedings of Symposium on Performance Analysis of Systems and Software(ISPASS-05).Austin,TX.March 2005.
    [179]Jian Li and Jose F.Martinez.Dynamic Power-Performance Adaptation of Parallel Computation on Chip Multiprocessors.In Proceedings of the International Symposium on High Performance Computer Architecture(HPCA-06).2006.
    [180]Juan Chen,Yong Dong,Xuejun Yang and Dan Wu.A Compiler-Directed Energy Saving Strategy for Parallelizing Applications in On-chip Multiprocessors.In Proceedings of the Fourth International Symposium on Parallel and Distributed Computing(ISPDC-05).Lille,France.IEEE Computer Society.p.147-154.July 4-6,2005 2005.
    [181]W.Liao and L.He.Power Modeling and Reduction of VLIW Processors.In Proceedings of Workshop on Compilers and Operating Systems for Low Power,in conjunction with International Conference on Parallel Architectures and Compilation Techniques.2001.
    [182]Chung-Hsing Hsu and U.Kremer.Single region vs.multiple regions:A comparison of different compiler-directed dynamic voltage scheduling approaches.In Proceedings of Workshop on Power-Aware Computer Systems(PACS-02).February 2002.
    [183]Xiaobo Fan,Carla S.Ellis and Alvin R.Lebeck.The Synergy between Power-aware Memory Systems and Processor Voltage Scaling.In Proceedings of the Workshop on Power-Aware Computer Systems(PACS-03).Dec.2003.
    [184]Albonesi D.H.Dynamic IPC/Clock Rate Optimization.In Proceedings of the 25th annual international symposium on Computer architecture.Barcelona,Spain.IEEE CS.p.282-292.1998.
    [185]Ponomarev D.,Kucuk G.and Ghose K.Dynamic Allocation of Datapath Resources for Low Power.In Proceedings of Workshop on Complexity-Effective Design(WCED-01),held in conjunction with ISCA.Goteborg,Sweden.June 2001.
    [186]Dropshoy S.,Buyuktosunogluz A.,Balasubramoniany R.and et al.Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power.In Proceedings of 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT-02). Charlottesville, VA, USA. IEEE CS.September 22-25 2002.
    
    [187]Buyuktosunoglu A., Karkhanisy T., Albonesi D. H. and et al. Energy Efficient Co-Adaptive Instruction Fetch and Issue. In Proceedings of 30th International Symposium on Computer Architecture (ISCA-03). IEEE CS. p.147-156. June 9-11 2003.
    
    [188]Balasubramonian R., Albonesi D., Buyuktosunoglu A. and et al. Memory Hierarchy Reconfiguration for Energy and Performance in General-Purpose Processor Architectures. In Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture. Monterey, California, USA.ACM/IEEE. p.245-257. December 10-13 2000.
    
    [189]W. Liao, J. Basile and L. He. Leakage Power Modeling and Reduction with Data Retention. In Proceedings of International Conference on Computer Aided Design.2002.
    
    [190]S. Manne, A. Klauser and D. Grunwald. Pipeline gating: Speculation control for energy reduction. In Proceedings of 25th International Symposium on Computer Architecture(ISCA-98).1998.
    
    [191]N. Vijaykrishnan, M. Kandemir, M. J. Irwin, H. Y. Kim and W. Ye. Energy-driven integrated hardware-software optimization using SimplePower. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA-00). June 2000.
    
    [192]E. Musoll. Predicting the usefulness of a block result: a micro-architectural technique for high-performance low-power processors. In Proceedings of 32nd Annual International Symposium on Microarchitecture. November 1999.
    
    [193]M. Pant, P. Pant, D. Wills and V. Tiwari. An Architectural Solution for the Inductive Noise Problem due to Clock-gating. In Proceedings of International Symposium on Low Power Electronics and Design. p.255-257.1999.
    
    [194]P. P. Chang, S. A. Mahlke, W. Y. Chen, N. J. Warter and W. Hwu. IMPACT: An Architectural Framework for Multiple-Instruction-Issue Processors. In Proceedings of International Symposium on Computer Architecture. May 1991.
    
    [195]Lee C., M. Potkonjak and W. H. Mangione-Smith. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems. In Proceedings of 30th Annual International Symposium on Microarchitecture.p.330-335. December 1997.
    
    [196]A. Malik, B. Moyer and D. Cermak. A Lower Power Unified Cache Architecture Providing Power and Performance Flexibility. In Proceedings of International Symposium on Low Power Electronics and Design. June 2000.
    
    [197]C. Zhang, F. Vahid and W. Najjar. Energy Benefits of a Configurable Line Size Cache for Embedded Systems. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI.2003.
    
    [198]Andre Costi Nacul and Tony Givargis. Adaptive Online Cache Reconfiguration for Low Power Systems. Technical Report #03-01. April 23 2003.
    [199]W.T.Shiue and C.Chakrabarti.Memory Exploration for Low Power Embedded Systems.In Proceedings of Design Automation Conference(DAC-99).October 1999.
    [200]The AMRM project,http://wwwl.ics.uci.edu/amrm/.
    [201]Standard Performance Evaluation Corporation.http://www.specbench.org.
    [202]W.A.Wulf and S.A.McKee.Hitting the memory wall:implications of the obvious.Computer Architecture News,23.p.20-24.March 1995.
    [203]Abdel-Hameed Badawy,Aneesh Aggarwal,Donald Yeung and Chau-Wen Tseng.The Efficacy of Software Prefetching and Locality Optimizations on Future Memory Systems.Journal of Instruction-Level Parallelism,.2004.
    [204]Chen Ding and Ken Kennedy.the Memory Bandwidth Bottleneck and its Amelioration by a Compiler.In Proceedings of 2000 International Parallel and Distribute Processing Symposium.Cancun,Mexico.May 2000.
    [205]Deepak Agarwal and Donald Yeung.Exploiting Application-Level Information to Reduce Memory Bandwidth Consumption.Technical Report.UMIACS-TR-2002-64.University of Maryland Inistitute for Advanced Computer Studies,2002.
    [206]夏军.数据局部性及其编译优化技术研究.博士学位.国防科学技术大学.4月2004.
    [207]Todd C.Mowry.Tolerating Latency Through Software-Controlled Data Prefetching.Ph.D.thesis.Stanford University.Computer System Laboratory.March 1994.
    [208]Shimin Chen,Phillip B.Gibbons and Todd C.Mowry.Improving Index Performance through Prefetching.In Proceedings of the 2001 SIGMOD International Conference on Management of Data.p.235-246.May 2001.
    [209]Nathaniel Mclntosh.Compiler Support for Software Prefetching.Ph.D.thesis.Rice University.Department of Computer Science.1998.
    [210]Yao Guo,Saurabh Chheda,Israel Koren,C.Mani Krishna and Csaba Andras Moritz.Energy-Aware Data Prefetching for General-Purpose Programs.In Proceedings of Proceedings of Workshop on Power Aware Computing Systems (PACS-04).p.78-94.December 2004.
    [211]Psilogeorgopoulos M.,Munteanu M.,Chuang T.and et al.Contemporary Techniques for Lower Power Circuit Design.PREST Deliverable D2.1:Tech Report.D2.1.the University of Sheffield,1998.
    [212]Gang Qu.What is the Limit of Energy Saving by Dynamic Voltage Scaling? In IEEE/ACM International Conference on Computer Aided Design.p.560-563.November 2001.
    [213]Chung-Hsing Hsu and Ulrich Kremer.Compiler-Directed Dynamic Voltage Scaling for Memory-Bound Applications.Technical Report.DCS-TR-498.Rutgers University,August 2002.
    [214]Transmeta Corp.http://www.transmeta.com/.
    [215]T.Martin.Balancing batteries,power and performance:system issues in cpu speed-setting for mobile computing. Ph.D. thesis. Carnegie Mellon University.1999.
    
    [216]J. Pouwelse, K. Langendoen and H. Sips. Dynamic voltage scaling on a low-power microprocessor. In Proceedings of The Seventh Annual International Conference on Mobile Computing and Networking. p.251 - 259.2001.
    
    [217] A. R. Lebeck, X. Fan, H. Zeng and C. Ellis. Power Aware Page Allocation. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-00). p.105 - 116.November 2000.
    
    [218]X. Fan, C. Ellis and A. R. Lebeck. Memory Controller Policies for DRAM Power Management. In Proceedings of International Symposium on Low Power Electronics and Design (ISLPED-01). p.129 - 134. August 2001.
    
    [219]X. Fan, C. S. Ellis and A. R. Lebeck. Modeling of DRAM power control policies using deterministic and stochastic petri nets. In Proceedings of Workshop on Power Aware Computing Systems. February 2002.
    
    [220]TM5400/TM5600 Data Book. Technology Report, November 1 2000.
    
    [221]Doug Burger and Todd M. Austin. The SimpleScalar tool set, Version 2.0.Technical Report:CS-TR-1342. University of Wisconsin-Madison, July 1997.
    
    [222]http://www.mathworks.com.
    
    [223]Terek A. AlEnawy and Hakan Aydin. Energy-Constrained Performance Optimizations for Real-time Operating Systems. In Proceedings of Workshop on Compilers and Operating Systems for Low-Power (COLP-03). New Orleans,LA.2003.
    
    [224]Cosmin Rusu, Rami Melhem and DanielMosse. Maximizing Rewards for Real-time Applications with Energy Constraints. ACM Transactions on Embedded Computing Systems, 2(4). p. 537-559.2003.
    
    [225]Deepak N. Agarwal, Sumitkumar N. Pamnani, Gang Qu and Donald Yeung.Transferring Performance Gain from Software Prefetching to Energy Reduction. In Proceedings of the 2004 International Symposium on Circuits and Systems (ISCAS2004). Vancouver, Canada. May 2004.
    
    [226]W. F. Gunsteren and H. J. C. Berendsen. GROMOS: GROningen MOlecular Simulation software. tech. report. Laboratory of Physical Chemistry, University of Groningen, Netherlands, 1988.
    
    [227]Kumar R., Farkas K. I., Jouppi N. P. and et al. Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction. In Proceedings of the 36th International Symposium on Microarchitecture (MICRO-03). San Diego, CA, USA. ACM/IEEE.p.81-92. December 3-5 2003.
    
    [228]Weglarz E. F., Saluja K. K. and Lipasti M. H. Minimizing Energy Consumption for High-Performance Processing. In Proceedings of the 2002 conference on Asia South Pacific design automation/VLSI Design. Bangalore, Indi.IEEE CS. p. 199-206. January 7-11 2002.
    
    [229]Kandemir M., Zhang W. and Karakoy M. Runtime Code Parallelization for On-Chip Multiprocessors.In Proceedings of 2003 Design,Automation and Test in Europe Conference and Exposition(DATE-03).Munich,Germany.IEEE CS.p.10510-10515.March 3-7 2003.
    [230]Wall D.W.Limits of Instruction-Level Parallelism.In Proceedings of Forth International Conference on Architectural Support for Programming Languages and Operating Systems.Santa Clara,California.ACM Press.p.176-188.April 8-111991.
    [231]易会战,杨学军.嵌入式应用中指令级并行的动态特性.In Proceedings of 2005中国计算机大会(CNCC-05).武汉,中国.Oct.2005.
    [232]易会战,陈娟.多核微处理器体系结构、编程模型和编译技术.技术报告.国防科技大学软件所编译组,Aug.2005.
    [233]Tendler J.,Dodson J.,Fields J.J.S.and et al.POWER4 system microarchitecture.IBM Journal of Research and Development,46(1).p.5-25.2002.
    [234]Eggers S.J.,Emer J.S.,Levy H.M.and et al.SIMULTANEOUS MULTITHREADING:A Platform for Next-Generation Processors.IEEE Micro,17(5).p.12-19.1997.
    [235]Sohi G.S.and Roth A.Speculative Multithreaded Processors.IEEE Computer,34(4).p.66-73.2001.
    [236]Asanovic K.Vector Processors(Appendix G).Computer Architecture:A Quantitative Approach,Third Edition,Morgan Kaufman.2002.
    [237]Kozyrakis C.E.,Perissakis S.,Patterson D.and et al.Scalable Processors in the Billion-Transistor Era:IRAM.IEEE Computer,30(9).p.75-78.1997.
    [238]Kozyrakis C.D.P.Overcoming the Limitations of Conventional Vector Processors.In Proceedings of 30th International Symposium on Computer Architecture(ISCA-03).San Diego,California,USA.IEEE CS.p.399-409.June 9-11 2003.
    [239]Krashinsky R.,Batten C.,Hampton M.and et al.The Vector-Thread Architecture.In Proceedings of the 31st annual international symposium on Computer architecture(ISCA-04).M(u|¨)nchen,Germany.IEEE CS.2004.
    [240]I.Kadayif,M.Kandemir and M.Karakoy.An Energy Saving Strategy Based on Adaptive Loop Parallelization.In Proceedings of Design Automation Conference (DAC'02).New Orleans,Louisiana,USA.p.p.195-200.June 10-14 2002.
    [241]http://www-03.ibm.com/servers/eserver/pseries/hardware/whitepapers/power4.htm 1.
    [242]http://www.sun.com/processors/UltraSPARC-IV/index.xml.
    [243]http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_8825,00.ht ml.
    [244]J.L.Hennessy and D.A.Patterson.Computer Architecture:A Quantitative Approach,Elsevier Science Pte Ltd.third edition.2003.
    [245]Hu Qingfeng,Chi Lihua and Liu jie.A New Load-Balancing Algorithm for Parallel Sparse Matrix-Vector Multiplication.In Proceedings of International Conference on Parallel Algorithms and Computing Environments.2003.
    
    [246]I. S. Duff, R. G. Grimes and J. G. Lewis. User's Guide for the Harwell-Boeing Sparse Matrix Collection. Tech. Report TR-PA-92-96. Toulouse Cedex, France.Oct 1992.
    
    [247] http://www.cise.ufl.edu/research/sparse/HBformat/HB/.
    
    [248]Naraig Manjikian. Multiprocessor Enhancements of the SimpleScalar Tool Set.29(1). p. 8-15. March 2001.
    
    [249]Jose Renau, Basilio Fraguela, James Tuck, Wei Liu, Milos Prvulovic, Luis Ceze,Smruti Sarangi, Paul Sack, Karin Strauss and Pablo Montesinos. SESC simulator.January 2005. http://sesc.sourceforge.net.
    
    [250]Pablo Montesinos Ortego and Paul Sack. SESC: SuperESCalar Simulator.December 20 2004. http://sesc.sourceforge.net/sescdoc.pdf.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700