文摘
The storage and access of massive small files are one of the challenges in the design of distributed file system. Hadoop distributed file system (HDFS) is primarily designed for reliable storage and fast access of very big files while it suffers a performance penalty with increasing number of small files. A middleware called Hmfs is proposed in this paper to improve the efficiency of storing and accessing small files on HDFS. It is made up of three layers, file operation interfaces to make it easier for software developers to submit different file requests, file management tasks to merge small files into big ones or extract small files from big ones in the background, and file buffers to improve the I/O performance. Hmfs boosts the file upload speed by using asynchronous write mechanism and the file download speed by adopting prefetching and caching strategy. The experimental results show that Hmfs can help to obtain high speed of storage and access for massive small files on HDFS.