TY - GEN
T1 - Hybrid Hadoop
T2 - 2021 International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2021
AU - Oh, Chanyoung
AU - Jung, Hyeonjin
AU - Yi, Saehanseul
AU - Yoon, Illo
AU - Yi, Youngmin
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/1/20
Y1 - 2021/1/20
N2 - As a GPU has become an essential component in high performance computing, it has been attempted by many works to leverage GPU computing in Hadoop. However, few works considered to fully utilizethe GPU in Hadoop and only a few works studied utilizing both CPU and GPU at the same time. In thispaper, we propose a CPU-GPU hybrid scheduling in Hadoop, where both CPUs and GPUs in a node are exploited as much as possible in an adaptive manner. The technical barrier stands in that the optimal number of GPU tasks is not known in advance, and the total number of Containers in a node cannot bechanged once a Hadoop job starts. In the proposed approach, we first determine the initial number of Containers as well as the hybrid execution mode, then the proposed dynamic scheduler adjusts the number of Containers for a GPU and a CPU with the help of a GPU monitor during the job execution. It also employs a load-balancing algorithm for the tail. The experiments with various benchmarks show that the proposed CPU-GPU hybrid scheduling achieves 3.79 × of speedup on average against the 12-core CPU-only Hadoop.
AB - As a GPU has become an essential component in high performance computing, it has been attempted by many works to leverage GPU computing in Hadoop. However, few works considered to fully utilizethe GPU in Hadoop and only a few works studied utilizing both CPU and GPU at the same time. In thispaper, we propose a CPU-GPU hybrid scheduling in Hadoop, where both CPUs and GPUs in a node are exploited as much as possible in an adaptive manner. The technical barrier stands in that the optimal number of GPU tasks is not known in advance, and the total number of Containers in a node cannot bechanged once a Hadoop job starts. In the proposed approach, we first determine the initial number of Containers as well as the hybrid execution mode, then the proposed dynamic scheduler adjusts the number of Containers for a GPU and a CPU with the help of a GPU monitor during the job execution. It also employs a load-balancing algorithm for the tail. The experiments with various benchmarks show that the proposed CPU-GPU hybrid scheduling achieves 3.79 × of speedup on average against the 12-core CPU-only Hadoop.
KW - CPU-GPU Heterogeneous Computing
KW - Distributed Systems
KW - Hadoop
KW - Performance Estimation
UR - http://www.scopus.com/inward/record.url?scp=85099878910&partnerID=8YFLogxK
U2 - 10.1145/3432261.3432264
DO - 10.1145/3432261.3432264
M3 - Conference contribution
AN - SCOPUS:85099878910
T3 - ACM International Conference Proceeding Series
SP - 40
EP - 49
BT - Proceedings of International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2021
PB - Association for Computing Machinery
Y2 - 20 January 2021 through 22 January 2021
ER -