Align-and-Attend Network for Globally and Locally Coherent Video Inpainting

Sanghyun Woo, Dahun Kim, Kwanyong Park, Joon Young Lee, In So Kweon

Research output: Contribution to conferencePaperpeer-review

1 Scopus citations

Abstract

Video inpainting is more challenging than image inpainting because of the extra temporal dimension. It requires inpainted contents to be globally coherent in both space and time. A natural solution for this problem is aggregating features from other frames, and thus, existing state-of-the-art methods rely heavily on 3D convolution or optical flow. However, these methods emphasize more on the temporally nearby frames, and long-term temporal information is not sufficiently stressed. In this work, we propose a novel two-stage alignment method. The first stage is an alignment module that uses computed homography between the target frame and the reference frames. The visible patches are then aggregated based on the frame similarity to roughly fill in the target holes. The second stage is an attention module that matches the generated patches with known reference patches in a non-local manner to refine the previous global alignment stage. Both stages consist of large spatial-temporal window size for the reference and thus enable modeling long-range correlations between distant information and the hole regions. The proposed model can even handle challenging scenes with large or slowly moving holes, which have been hardly modeled by existing approaches. Experiments on video object removal demonstrate that our method significantly outperforms previous state-of-the-art learning approaches.

Original languageEnglish
StatePublished - 2020
Event31st British Machine Vision Conference, BMVC 2020 - Virtual, Online
Duration: 7 Sep 202010 Sep 2020

Conference

Conference31st British Machine Vision Conference, BMVC 2020
CityVirtual, Online
Period7/09/2010/09/20

Fingerprint

Dive into the research topics of 'Align-and-Attend Network for Globally and Locally Coherent Video Inpainting'. Together they form a unique fingerprint.

Cite this