TY - GEN
T1 - Minimizing GPU Kernel Launch Overhead in Deep Learning Inference on Mobile GPUs
AU - Kim, Sumin
AU - Oh, Seunghwan
AU - Yi, Youngmin
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/2/24
Y1 - 2021/2/24
N2 - The need for on-device real-time Deep Learning inference is increasing as deep learning on edge devices such as smartphones and robots are becoming popular. Although hardware acceleration on NPU is attracting more attention, the recent mobile GPUs are fast enough to provide the potential to achieve real-time inference of many CNNs. In this paper, we first analyze the inference time of the widely used CNNs on the recent mobile GPUs and reveal that significant overhead exists for the GPU kernel launches. Then, we identify various factors that cause the kernel launch overhead, from which we formulate a performance model that can predict the optimal period for the kernel flush that can lead to the minimal overhead. Our experimental results show that we could achieve up to 64% and 31% of speedups in the inference of various CNNs with TensorFlow Lite and ARM Compute Library on Adreno 650 GPU and Mali G76 GPU.
AB - The need for on-device real-time Deep Learning inference is increasing as deep learning on edge devices such as smartphones and robots are becoming popular. Although hardware acceleration on NPU is attracting more attention, the recent mobile GPUs are fast enough to provide the potential to achieve real-time inference of many CNNs. In this paper, we first analyze the inference time of the widely used CNNs on the recent mobile GPUs and reveal that significant overhead exists for the GPU kernel launches. Then, we identify various factors that cause the kernel launch overhead, from which we formulate a performance model that can predict the optimal period for the kernel flush that can lead to the minimal overhead. Our experimental results show that we could achieve up to 64% and 31% of speedups in the inference of various CNNs with TensorFlow Lite and ARM Compute Library on Adreno 650 GPU and Mali G76 GPU.
KW - Deep Learning
KW - Kernel Launch Overhead
KW - Mobile GPU
KW - OpenCL
UR - http://www.scopus.com/inward/record.url?scp=85102071802&partnerID=8YFLogxK
U2 - 10.1145/3446382.3448606
DO - 10.1145/3446382.3448606
M3 - Conference contribution
AN - SCOPUS:85102071802
T3 - HotMobile 2021 - Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications
SP - 57
EP - 63
BT - HotMobile 2021 - Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications
PB - Association for Computing Machinery, Inc
T2 - 22nd International Workshop on Mobile Computing Systems and Applications, HotMobile 2021
Y2 - 24 February 2021 through 26 February 2021
ER -