LUTMap: A Dynamic Heuristic Application Mapping Algorithm Based on Lookup Tables

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

LUTMap: A Dynamic Heuristic Application Mapping Algorithm Based on Lookup Tables

详细信息查看全文

关键词：Multicore ; Network ; on ; chip ; Application mapping ; Lookup table
刊名：Lecture Notes in Computer Science
出版年：2016
出版时间：2016
年：2016
卷：9864
期：1
页码：134-146
全文大小：1,013 KB
参考文献：1.Bienia, C., Kumar, S., Singh, J.P., Li, K.: The parsec benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT 2008, pp. 72–81. ACM, New York (2008)
2.Chen, Y.J., Yang, C.L., Chang, Y.S.: An architectural co-synthesis algorithm for energy-aware network-on-chip design. J. Syst. Archit. 55(5–6), 299–309 (2009)CrossRef
3.Chou, C.L., Ogras, U., Marculescu, R.: Energy- and performance-aware incremental mapping for networks on chip with multiple voltage levels. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 27(10), 1866–1879 (2008)CrossRef
4.Dally, W., Towles, B.: Principles Practices Interconnection Netw. Morgan Kaufmann Publishers Inc., San Francisco (2003)
5.Demaine, E.D., Fekete, S.P., Rote, G., Schweer, N., Schymura, D., Zelke, M.: Integer point sets minimizing average pairwise distance: what is the optimal shape of a town? Comput. Geom. 44(2), 82–94 (2011). Special issue of selected papers from the 21st Annual Canadian Conference on Computational GeometryMathSciNet CrossRef MATH
6.Fattah, M., Rahmani, A.M., Xu, T., Kanduri, A., Liljeberg, P., Plosila, J., Tenhunen, H.: Mixed-criticality run-time task mapping for noc-based many-core systems. In: 2014 22nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 458–465, February 2014
7.Fleig, T., Mattes, O., Karl, W.: Evaluation of adaptive memory management techniques on the tilera tile-gx platform. In: 2014 27th International Conference on Architecture of Computing Systems (ARCS), pp. 1–8, February 2014
8.Ghosh, A., Paul, S., Bhunia, S.: Energy-efficient application mapping in FPGA through computation in embedded memory blocks. In: 2012 25th International Conference on VLSI Design (VLSID), pp. 424–429, January 2012
9.Hu, J., Marculescu, R.: Energy-aware communication and task scheduling for network-on-chip architectures under real-time constraints. In: Proceedings of the Conference on Design, Automation and Test in Europe, DATE 2004, vol. 1, p. 10234. IEEE Computer Society, Washington, DC (2004)
10.Hyde, R.: The Art of Assembly Language, 2nd edn. No Starch Press, San Francisco (2010)
11.LaCouvee, D.: Fact or fiction: Android apps only use one CPU core, December 2015. http://www.androidauthority.com/fact-or-fiction-android-apps-only-use-one-cpu-core-610352/
12.Lei, T., Kumar, S.: A two-step genetic algorithm for mapping task graphs to a network on chip architecture. In: 2003 Proceedings of Euromicro Symposium on Digital System Design, pp. 180–187 (2003)
13.Leung, V.J., Sabin, G., Sadayappan, P.: Parallel job scheduling policies to improve fairness: a case study. In: 39th International Conference on Parallel Processing, ICPP. Workshops 2010, San Diego, California, USA, 13–16 September, pp. 346–353 (2010)
14.Leutenegger, S.T., Vernon, M.K.: The performance of multiprogrammed multiprocessor scheduling algorithms. SIGMETRICS Perform. Eval. Rev. 18(1), 226–236 (1990)CrossRef
15.Magnusson, P., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., Werner, B.: Simics: a full system simulation platform. Computer 35(2), 50–58 (2002)CrossRef
16.Martin, M.M., Sorin, D.J., Beckmann, B.M., Marty, M.R., Xu, M., Alameldeen, A.R., Moore, K.E., Hill, M.D., Wood, D.A.: Multifacet’s general execution-driven multiprocessor simulator (gems) toolset. Computer Architecture News, September 2005
17.Mediatek: Helio x20, December 2015. http://mediatek-helio.com/x20/
18.de Souza Carvalho, E., Calazans, N., Moraes, F.: Dynamic task mapping for MPSoCS. IEEE Des. Test Comput. 27(5), 26–35 (2010)CrossRef
19.TGG: Task graph generator, July 2014. http://taskgraphgen.sourceforge.net/
20.Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The splash-2 programs: characterization and methodological considerations. In: Proceedings of the 22nd International Symposium on Computer Architecture, pp. 24–36, June 1995
21.Xu, T., Toivonen, J., Pahikkala, T., Leppanen, V.: BDMap: a heuristic application mapping algorithm for the big data era. In: 2014 IEEE 11th International Conference on Ubiquitous Intelligence and Computing and IEEE 11th International Conference on Autonomic and Trusted Computing, and IEEE 14th International Conference on Scalable Computing and Communications and Its Associated Workshops (UTC-ATC-ScalCom), pp. 821–828, December 2014
22.Xu, T.C., Leppänen, V.: DBFS: dual best-first search mapping algorithm for shared-cache multicore processors. In: Wang, G., Zomaya, A., Martinez Perez, G., Kenli, L. (eds.) ICA3PP 2015. LNCS, vol. 9528, pp. 185–198. Springer, Heidelberg (2015). doi:10.1007/978-3-319-27119-4_13 CrossRef
23.Xu, T.C., Liljeberg, P., Plosila, J., Tenhunen, H.: Exploration of heuristic scheduling algorithms for 3D multicore processors. In: Proceedings of the 15th International Workshop on Software and Compilers for Embedded Systems, SCOPES 2012, pp. 22–31. ACM, New York (2012)
24.Xu, T.C., Leppänen, V.: Cache- and communication-aware application mapping for shared-cache multicore processors. In: Pinho, L.M.P., Karl, W., Cohen, A., Brinkschulte, U. (eds.) ARCS 2015. LNCS, vol. 9017, pp. 55–67. Springer, Heidelberg (2015)
25.Xu, T.C., Liljeberg, P., Tenhunen, H.: A minimal average accessing time scheduler for multicore processors. In: Xiang, Y., Cuzzocrea, A., Hobbs, M., Zhou, W. (eds.) ICA3PP 2011, Part II. LNCS, vol. 7017, pp. 287–299. Springer, Heidelberg (2011)CrossRef
作者单位：Thomas Canhao Xu (22)
Ville Leppänen (22)

22. Department of Information Technology, University of Turku, 20014, Turku, Finland
丛书名：Internet and Distributed Computing Systems
ISBN：978-3-319-45940-0
刊物类别：Computer Science
刊物主题：Artificial Intelligence and Robotics
Computer Communication Networks
Software Engineering
Data Encryption
Database Management
Computation by Abstract Devices
Algorithm Analysis and Problem Complexity
出版者：Springer Berlin / Heidelberg
ISSN：1611-3349
卷排序：9864

文摘

In this paper, we propose and investigate a dynamic heuristic mapping algorithm with lookup table optimizations. Distributed and parallel computing are trends due to the performance requirement of modern applications. Application mapping in a multiprocessor system is therefore critical due to the dynamic and unpredictable nature of the applications. We analyse the communication delay among different tasks in an application. A fundamental algorithm is analysed to optimize the average delay of the mapping region. We discuss and evaluate the effectiveness of the algorithm in terms of average intra-application latency. Results from synthetic applications revealed that average latencies from the mapping regions of the fundamental algorithm have reduced up to 23 % compared with the incremental mapping. By noticing the time overhead of the algorithm due to extra number of search spaces, we introduce a mechanism with lookup tables to speed up the process of searching optimized mapping regions. The lookup table is examined with both size and construction time. Experiments shown that the lookup table is small enough to fit into the cache, and the table can be constructed in milliseconds in most practical cases. The results from real applications show that the average execution time of applications of the proposed algorithm has reduced by 15.2 % compared with the first fit algorithm.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700