用户名: 密码: 验证码:
Load Balancing in MapReduce Based on Data Locality
详细信息    查看全文
  • 作者:Yi Chen (24)
    Zhaobin Liu (24)
    Tingting Wang (24)
    Lu Wang (25)
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2014
  • 出版时间:2014
  • 年:2014
  • 卷:8630
  • 期:1
  • 页码:229-241
  • 全文大小:354 KB
  • 参考文献:1. Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. Communications of the ACM聽51, 107鈥?13 (2008) CrossRef
    2. Morton, K., Balazinska, M., Grossman, D.: Paratimer: A progress indicator for mapreduce dags. In: Proceedings of the, ACM SIGMOD International Conference on Management of Data, pp. 507鈥?18. ACM (2010)
    3. Ferreira Cordeiro, R.L., Traina Junior, C., Machado Traina, A.J., L贸pez, J., Kang, U., Faloutsos, C.: Clustering very large multi-dimensional datasets with mapreduce. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 690鈥?98. ACM (2011)
    4. Li, B., Mazur, E., Diao, Y., McGregor, A., Shenoy, P.: A platform for scalable one-pass analytics using mapreduce. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 985鈥?96. ACM (2011)
    5. He, B., Fang, W., Luo, Q., Govindaraju, N.K., Wang, T.: Mars: A mapreduce framework on graphics processors. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 260鈥?69. ACM (2008)
    6. Gufler, B., Augsten, N., Reiser, A., Kemper, A.: Load balancing in mapreduce based on scalable cardinality estimates. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE), pp. 522鈥?33. IEEE (2012)
    7. Kwon, Y., Balazinska, M., Howe, B., Rolia, J.: A study of skew in mapreduce applications. Open Cirrus Summit (2011)
    8. Xu, Y., Kostamaa, P., Zhou, X., Chen, L.: Handling data skew in parallel joins in shared-nothing systems. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1043鈥?052. ACM (2008)
    9. Xu, Y., Kostamaa, P.: Efficient outer join data skew handling in parallel dbms. Proceedings of the VLDB Endowment聽2, 1390鈥?396 (2009) CrossRef
    10. Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R.H., Stoica, I.: Improving mapreduce performance in heterogeneous environments. In: OSDI, vol.聽8, p. 7 (2008)
    11. Kwon, Y., Balazinska, M., Howe, B., Rolia, J.: Skewtune: mitigating skew in mapreduce applications. In: Proceedings of the, ACM SIGMOD International Conference on Management of Data, pp. 25鈥?6. ACM (2012)
    12. Vahdat, A., Al-Fares, M., Farrington, N., Mysore, R.N., Porter, G., Radhakrishnan, S.: Scale-out networking in the data center. IEEE Micro聽30, 29鈥?1 (2010) CrossRef
    13. Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer Systems, pp. 265鈥?78. ACM (2010)
    14. Niranjan Mysore, R., Pamboris, A., Farrington, N., Huang, N., Miri, P., Radhakrishnan, S., Subramanya, V., Vahdat, A.: Portland: A scalable fault-tolerant layer 2 data center network fabric. ACM SIGCOMM Computer Communication Review聽39, 39鈥?0 (2009) CrossRef
    15. Ahmad, F., Chakradhar, S.T., Raghunathan, A., Vijaykumar, T.: Tarazu: Optimizing mapreduce on heterogeneous clusters. ACM SIGARCH Computer Architecture News聽40, 61鈥?4 (2012) CrossRef
    16. Hammoud, M., Sakr, M.F.: Locality-aware reduce task scheduling for mapreduce. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), pp. 570鈥?76. IEEE (2011)
    17. Ibrahim, S., Jin, H., Lu, L., Wu, S., He, B., Qi, L.: Leen: Locality/fairness-aware key partitioning for mapreduce in the cloud. In: 2010 IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom), pp. 17鈥?4. IEEE (2010)
  • 作者单位:Yi Chen (24)
    Zhaobin Liu (24)
    Tingting Wang (24)
    Lu Wang (25)

    24. School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, P.R. China
    25. China Academy of Civil Aviation Science and Technology, Beijing, 100028, P.R.China
  • ISSN:1611-3349
文摘
With explosive growth in data size at era of information, MapReduce - a programing mode, which can process data in parallel, has been widely used. However, the original system gradually exposes some shortcomings. For example, handling skewed data can cause the imbalance of the system loads. After mapper processes data, the result will be sent to reducer by partition function. An inappropriate partition algorithm may result in poor network quality, the overloading of some reducers and the extension of the execution time of job. In summary, using an inappropriate algorithm to process skewed data will form a negative impact on the system performance. In order to solve load imbalance problem and improve performance of cluster, we plan to design an effective partition algorithm to guide the process of assigning data. Therefore, we develop an algorithm named CLP - Cluster Locality Partition, this algorithm consists of three parts: Preprocess part, Data-Cluster part and Locality-Partition part. The experimental results illustrate that the algorithm proposed in this paper is better than the default partition algorithm in the aspects of execution time and load balancing.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700