TY - GEN
T1 - An efficient parallel motion estimation algorithm and X264 parallelization in CUDA
AU - Ko, Youngsub
AU - Yi, Youngmin
AU - Ha, Soonhoi
PY - 2011
Y1 - 2011
N2 - H.264/AVC video encoders have been widely used for its high coding efficiency. Since the computational demand proportional to the frame resolution is constantly increasing, it has been of great interest to accelerate H.264/AVC by parallel processing. Recently, graphics processing units (GPUs) have emerged as a viable target for accelerating general purpose applications by exploiting fine-grain data parallelisms. Despite extensive research effort to use GPUs to accelerate the H.264/AVC algorithm, it has not been successful to achieve any speed-up over the x264 algorithm that is known as the fastest CPU implementation because of significant communication overhead between the host CPU and the GPU and intra-frame dependency in the algorithm. In this paper, we propose a novel motion estimation (ME) algorithm tailored for NVIDIA GPU implementation. It is accompanied by a novel pipelining technique, called sub-frame ME processing, to effectively hide the communication overhead between the host CPU and the GPU. The proposed H.264 encoder achieves more than 20% speed-up compared with x264.
AB - H.264/AVC video encoders have been widely used for its high coding efficiency. Since the computational demand proportional to the frame resolution is constantly increasing, it has been of great interest to accelerate H.264/AVC by parallel processing. Recently, graphics processing units (GPUs) have emerged as a viable target for accelerating general purpose applications by exploiting fine-grain data parallelisms. Despite extensive research effort to use GPUs to accelerate the H.264/AVC algorithm, it has not been successful to achieve any speed-up over the x264 algorithm that is known as the fastest CPU implementation because of significant communication overhead between the host CPU and the GPU and intra-frame dependency in the algorithm. In this paper, we propose a novel motion estimation (ME) algorithm tailored for NVIDIA GPU implementation. It is accompanied by a novel pipelining technique, called sub-frame ME processing, to effectively hide the communication overhead between the host CPU and the GPU. The proposed H.264 encoder achieves more than 20% speed-up compared with x264.
KW - CUDA
KW - GPU
KW - H.264
KW - Motion Estimation
UR - http://www.scopus.com/inward/record.url?scp=84857758850&partnerID=8YFLogxK
U2 - 10.1109/DASIP.2011.6136860
DO - 10.1109/DASIP.2011.6136860
M3 - Conference contribution
AN - SCOPUS:84857758850
SN - 9781457706196
T3 - Conference on Design and Architectures for Signal and Image Processing, DASIP
SP - 91
EP - 98
BT - Proceedings of the 2011 Conference on Design and Architectures for Signal and Image Processing, DASIP 2011
T2 - 2011 Conference on Design and Architectures for Signal and Image Processing, DASIP 2011
Y2 - 2 November 2011 through 4 November 2011
ER -