TY - GEN
T1 - Iteratively Selecting an Easy Reference Frame Makes Unsupervised Video Object Segmentation Easier
AU - Lee, Youngjo
AU - Seong, Hongje
AU - Kim, Euntai
N1 - Publisher Copyright:
Copyright © 2022, Association for the Advancement of Artificial Intelligence (www.aaai.org).
PY - 2022/6/30
Y1 - 2022/6/30
N2 - Unsupervised video object segmentation (UVOS) is a per-pixel binary labeling problem which aims at separating the foreground object from the background in the video without using the ground truth (GT) mask of the foreground object. Most of the previous UVOS models use the first frame or the entire video as a reference frame to specify the mask of the foreground object. Our question is why the first frame should be selected as a reference frame or why the entire video should be used to specify the mask. We believe that we can select a better reference frame to achieve the better UVOS performance than using only the first frame or the entire video as a reference frame. In our paper, we propose Easy Frame Selector (EFS). The EFS enables us to select an “easy” reference frame that makes the subsequent VOS become easy, thereby improving the VOS performance. Furthermore, we propose a new framework named as Iterative Mask Propagation (IMP). In the framework, we repeat applying EFS to the given video and selecting an “easier” reference frame from the video than the previous iteration, increasing the VOS performance incrementally. The IMP consists of EFS, Bi-directional Mask Prediction (BMP), and Temporal Information Updating (TIU). From the proposed framework, we achieve state-of-the-art performance in three UVOS benchmark sets: DAVIS16, FBMS, and SegTrack-V2.
AB - Unsupervised video object segmentation (UVOS) is a per-pixel binary labeling problem which aims at separating the foreground object from the background in the video without using the ground truth (GT) mask of the foreground object. Most of the previous UVOS models use the first frame or the entire video as a reference frame to specify the mask of the foreground object. Our question is why the first frame should be selected as a reference frame or why the entire video should be used to specify the mask. We believe that we can select a better reference frame to achieve the better UVOS performance than using only the first frame or the entire video as a reference frame. In our paper, we propose Easy Frame Selector (EFS). The EFS enables us to select an “easy” reference frame that makes the subsequent VOS become easy, thereby improving the VOS performance. Furthermore, we propose a new framework named as Iterative Mask Propagation (IMP). In the framework, we repeat applying EFS to the given video and selecting an “easier” reference frame from the video than the previous iteration, increasing the VOS performance incrementally. The IMP consists of EFS, Bi-directional Mask Prediction (BMP), and Temporal Information Updating (TIU). From the proposed framework, we achieve state-of-the-art performance in three UVOS benchmark sets: DAVIS16, FBMS, and SegTrack-V2.
UR - http://www.scopus.com/inward/record.url?scp=85138193627&partnerID=8YFLogxK
U2 - 10.1609/aaai.v36i2.20011
DO - 10.1609/aaai.v36i2.20011
M3 - Conference contribution
AN - SCOPUS:85138193627
T3 - Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022
SP - 1245
EP - 1253
BT - AAAI-22 Technical Tracks 2
PB - Association for the Advancement of Artificial Intelligence
T2 - 36th AAAI Conference on Artificial Intelligence, AAAI 2022
Y2 - 22 February 2022 through 1 March 2022
ER -