用户名: 密码: 验证码:
Hmfs: Efficient Support of Small Files Processing over HDFS
详细信息    查看全文
  • 作者:Cairong Yan (25)
    Tie Li (25)
    Yongfeng Huang (25)
    Yanglan Gan (25)
  • 关键词:HDFS ; small files ; middleware ; asynchronous write ; prefetching
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2014
  • 出版时间:2014
  • 年:2014
  • 卷:8631
  • 期:1
  • 页码:54-67
  • 全文大小:672 KB
  • 参考文献:1. Hadoop, http://hadoop.apache.org/
    2. Shvachko, K., Kuang, H.: Radia. S.: The hadoop distributed file system. In: IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST 2010). Incline Village, Nevada (2010)
    3. Dong, B., Zheng, Q., Tian, F., et al.: An optimized approach for storing and accessing small files on cloud storage. Journal of Network and Computer Applications聽35(6), 1847鈥?862 (2012) CrossRef
    4. Dong, B., Qiu, J., Zheng, Q., et al.: A novel approach to improving the efficiency of storing and accessing small files on hadoop: a case study by powerpoint files. In: IEEE International Conference on Services Computing (SCC 2010), Miami, Florida, USA (2010)
    5. Liu, X., Han, J., Zhong, Y., et al.: Implementing WebGIS on hadoop: a case study of improving small file I/O performance on HDFS. In: IEEE International Conference on Cluster Computing and Workshops (CLUSTER 2009), New Orleans, LA, USA (2009)
    6. Cui, J., Zhang, Y., Li, C., Xing, C.: A packaging approach for massive amounts of small geospatial files with HDFS. In: Gao, H., Lim, L., Wang, W., Li, C., Chen, L. (eds.) WAIM 2012. LNCS, vol.聽7418, pp. 210鈥?15. Springer, Heidelberg (2012) CrossRef
    7. Hadoop Archives, http://hadoop.apache.org/common/docs/r0.20.2/hadoop_archive
    8. Sequence File, http://wiki.apache.org/hadoop/SequenceFile
    9. Hbase, http://hbase.apache.org/
    10. Gohil, P., Panchal, B.: Efficient ways to improve the performance of HDFS for small files. Computer Engineering and Intelligent Systems聽5(1), 45鈥?9 (2014)
    11. Wang, Y., Zhang, S., Liu, H.: The design of distributed file system based on HDFS. Applied Mechanics and Materials聽423, 2733鈥?736 (2013) CrossRef
    12. Mao, Y., Min, W.: Storage and accessing small files based on HDFS. In: Patnaik, S., Li, X. (eds.) 4th International Conference on Computer Science and Information Technology (CCSIT 2014). AISC, vol.聽255, pp. 565鈥?73. Springer, Heidelberg (2014)
    13. Chandrasekar, S., Dakshinamurthy, R., Seshakumar, P., et al.: A novel indexing scheme for efficient handling of small files in hadoop distributed file system. In: 2013 International Conference on Computer Communication and Informatics, ICCCI 2013 (2013)
    14. Mackey, G., Sehrish, S., Wang, J.: Improving metadata management for small files in HDFS. In: IEEE International Conference on Cluster Computing and Workshops (CLUSTER 2009), New Orleans, Louisiana, USA (2009)
  • 作者单位:Cairong Yan (25)
    Tie Li (25)
    Yongfeng Huang (25)
    Yanglan Gan (25)

    25. School of Computer Science and Technology, Donghua University, 201620, Shanghai, China
  • ISSN:1611-3349
文摘
The storage and access of massive small files are one of the challenges in the design of distributed file system. Hadoop distributed file system (HDFS) is primarily designed for reliable storage and fast access of very big files while it suffers a performance penalty with increasing number of small files. A middleware called Hmfs is proposed in this paper to improve the efficiency of storing and accessing small files on HDFS. It is made up of three layers, file operation interfaces to make it easier for software developers to submit different file requests, file management tasks to merge small files into big ones or extract small files from big ones in the background, and file buffers to improve the I/O performance. Hmfs boosts the file upload speed by using asynchronous write mechanism and the file download speed by adopting prefetching and caching strategy. The experimental results show that Hmfs can help to obtain high speed of storage and access for massive small files on HDFS.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700