As the era of Big-data has risen, the importance of big data technologies is also increasing day by day. Especially, Hadoop has become a critical part of the overall Big-data system because of its ability to store, process, and analyze thousands of terabytes of data. A major issue for supporting high performance on Hadoop is managing the growth of data while satisfying high storage I/O request. Hadoop's overall performance is largely influenced by the storage input/output(I/O). However, storage I/O technologies are still very limited. Therefore, now more than ever, studies on improving storage I/O on a distributed file system of Hadoop(HDFS) have been gaining popularity. To this end, latest trend in storage systems is to utilize hybrid storage devices. However, it is not easy to use the information of heterogeneous storage devices in HDFS. This is because, when reading data, HDFS is unable to exploit such heterogeneous storage type information yet. In this paper, we propose a hybrid block-selection method on the HDFS, we consider the storage type such as SSD and HDD when reading data. Using this method, the Hadoop Eco System utilizes the high SSD bandwidth by priority. As a result, we certainly improve the Hadoop Eco System overall performance. In the experiments, we demonstrated that our new method efficiently reduced the execution time of select count(∗) query and TPCH benchmark up to 22% and 30% on average.1
|Title of host publication||2016 Symposium on Applied Computing, SAC 2016|
|Publisher||Association for Computing Machinery|
|Number of pages||7|
|Publication status||Published - 2016 Apr 4|
|Event||31st Annual ACM Symposium on Applied Computing, SAC 2016 - Pisa, Italy|
Duration: 2016 Apr 4 → 2016 Apr 8
|Name||Proceedings of the ACM Symposium on Applied Computing|
|Other||31st Annual ACM Symposium on Applied Computing, SAC 2016|
|Period||16/4/4 → 16/4/8|
Bibliographical noteFunding Information:
This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIP) (NRF-2015R1A2A1A05001845).
© 2016 ACM.
All Science Journal Classification (ASJC) codes