TY - GEN
T1 - New feature-level video classification via temporal attention model
AU - Seong, Hongje
AU - Woo, Suhan
AU - Hyun, Junhyuk
AU - Chang, Hyunbae
AU - Lee, Suhyeon
AU - Kim, Euntai
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/10/15
Y1 - 2018/10/15
N2 - CoVieW 2018 is a new challenge which aims at simultaneous scene and action recognition for untrimmed video [1]. In the challenge, frame-level video features extracted by pre-trained deep convolutional neural network (CNN) are provided for video-level classification. In this paper, a new approach for the video-level classification method is proposed. The proposed method focuses on the analysis in temporal domain and the temporal attention model is developed. To compensate for the differences in the lengths of various videos, temporal padding method is also developed to unify the lengths of videos. Further, data augmentation is performed to enhance some validation accuracy. Finally, for the train/validation in CoView 2018 dataset we recorded the performance of 95.53% accuracy in the scene and 87.17% accuracy in the action using temporal attention model, nonzero padding and data augmentation. The top-1 hamming score is the standard metric in the CoVieW 2018 challenge and 91.35% is obtained by the proposed method.
AB - CoVieW 2018 is a new challenge which aims at simultaneous scene and action recognition for untrimmed video [1]. In the challenge, frame-level video features extracted by pre-trained deep convolutional neural network (CNN) are provided for video-level classification. In this paper, a new approach for the video-level classification method is proposed. The proposed method focuses on the analysis in temporal domain and the temporal attention model is developed. To compensate for the differences in the lengths of various videos, temporal padding method is also developed to unify the lengths of videos. Further, data augmentation is performed to enhance some validation accuracy. Finally, for the train/validation in CoView 2018 dataset we recorded the performance of 95.53% accuracy in the scene and 87.17% accuracy in the action using temporal attention model, nonzero padding and data augmentation. The top-1 hamming score is the standard metric in the CoVieW 2018 challenge and 91.35% is obtained by the proposed method.
KW - Convolutional neural network
KW - CoVieW 2018
KW - Data augmentation
KW - Temporal attention
KW - Temporal padding
KW - Untrimmed video classification
UR - http://www.scopus.com/inward/record.url?scp=85058182380&partnerID=8YFLogxK
U2 - 10.1145/3265987.3265990
DO - 10.1145/3265987.3265990
M3 - Conference contribution
AN - SCOPUS:85058182380
T3 - CoVieW 2018 - Proceedings of the 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild, co-located with MM 2018
SP - 31
EP - 34
BT - CoVieW 2018 - Proceedings of the 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild, co-located with MM 2018
PB - Association for Computing Machinery, Inc
T2 - 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild, CoVieW 2018, in conjunction with ACM Multimedia, MM 2018
Y2 - 22 October 2018
ER -