TY - GEN
T1 - Self-supervised Monocular Depth Estimation from Thermal Images via Adversarial Multi-spectral Adaptation
AU - Shin, Ukcheol
AU - Park, Kwanyong
AU - Lee, Byeong Uk
AU - Lee, Kyunghyun
AU - Kweon, In So
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Recently, thermal image based 3D understanding is gradually attracting attention for an illumination condition agnostic machine vision. However, the difficulty of the thermal image lies in insufficient training supervision due to its low-contrast and texturesless properties. Also, introducing additional modality requires further constraints such as complicated multi-sensor calibration and synchronized data acquisition. To leverage additional modality information without such constraints, we propose a novel training framework that consists of self-supervised learning of unpaired multi-spectral images and feature-level adversarial adaptation. In the training stage, we utilize unpaired RGB/thermal video and partially shared network architecture consisting of modality-specific feature extractors and modality-independent decoder. Through the shared network design, the depth decoder can leverage the self-supervised signal of the unpaired RGB images. Feature-level adversarial adaptation minimizes the gap between RGB and thermal features and eventually makes the thermal encoder extract representative and informative features. Based on the proposed method, the trained depth network shows outperformed results than previous state-of-the-art methods.
AB - Recently, thermal image based 3D understanding is gradually attracting attention for an illumination condition agnostic machine vision. However, the difficulty of the thermal image lies in insufficient training supervision due to its low-contrast and texturesless properties. Also, introducing additional modality requires further constraints such as complicated multi-sensor calibration and synchronized data acquisition. To leverage additional modality information without such constraints, we propose a novel training framework that consists of self-supervised learning of unpaired multi-spectral images and feature-level adversarial adaptation. In the training stage, we utilize unpaired RGB/thermal video and partially shared network architecture consisting of modality-specific feature extractors and modality-independent decoder. Through the shared network design, the depth decoder can leverage the self-supervised signal of the unpaired RGB images. Feature-level adversarial adaptation minimizes the gap between RGB and thermal features and eventually makes the thermal encoder extract representative and informative features. Based on the proposed method, the trained depth network shows outperformed results than previous state-of-the-art methods.
KW - 3D computer vision
KW - Applications: Robotics
UR - https://www.scopus.com/pages/publications/85148996006
U2 - 10.1109/WACV56688.2023.00575
DO - 10.1109/WACV56688.2023.00575
M3 - Conference contribution
AN - SCOPUS:85148996006
T3 - Proceedings - 2023 IEEE Winter Conference on Applications of Computer Vision, WACV 2023
SP - 5787
EP - 5796
BT - Proceedings - 2023 IEEE Winter Conference on Applications of Computer Vision, WACV 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 23rd IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2023
Y2 - 3 January 2023 through 7 January 2023
ER -