Minimizing GPU Kernel Launch Overhead in Deep Learning Inference on Mobile GPUs

Sumin Kim, Seunghwan Oh, Youngmin Yi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

14 Scopus citations

Abstract

The need for on-device real-time Deep Learning inference is increasing as deep learning on edge devices such as smartphones and robots are becoming popular. Although hardware acceleration on NPU is attracting more attention, the recent mobile GPUs are fast enough to provide the potential to achieve real-time inference of many CNNs. In this paper, we first analyze the inference time of the widely used CNNs on the recent mobile GPUs and reveal that significant overhead exists for the GPU kernel launches. Then, we identify various factors that cause the kernel launch overhead, from which we formulate a performance model that can predict the optimal period for the kernel flush that can lead to the minimal overhead. Our experimental results show that we could achieve up to 64% and 31% of speedups in the inference of various CNNs with TensorFlow Lite and ARM Compute Library on Adreno 650 GPU and Mali G76 GPU.

Original languageEnglish
Title of host publicationHotMobile 2021 - Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications
PublisherAssociation for Computing Machinery, Inc
Pages57-63
Number of pages7
ISBN (Electronic)9781450383233
DOIs
StatePublished - 24 Feb 2021
Event22nd International Workshop on Mobile Computing Systems and Applications, HotMobile 2021 - Virtual, Online, United Kingdom
Duration: 24 Feb 202126 Feb 2021

Publication series

NameHotMobile 2021 - Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications

Conference

Conference22nd International Workshop on Mobile Computing Systems and Applications, HotMobile 2021
Country/TerritoryUnited Kingdom
CityVirtual, Online
Period24/02/2126/02/21

Keywords

  • Deep Learning
  • Kernel Launch Overhead
  • Mobile GPU
  • OpenCL

Fingerprint

Dive into the research topics of 'Minimizing GPU Kernel Launch Overhead in Deep Learning Inference on Mobile GPUs'. Together they form a unique fingerprint.

Cite this