TY - GEN
T1 - Learnable Negative Proposals Using Dual-Signed Cross-Entropy Loss for Weakly Supervised Video Moment Localization
AU - Kim, Sunoh
AU - Um, Daeho
AU - Choi, Hyun Jun
AU - Choi, Jin Young
N1 - Publisher Copyright:
© 2024 Owner/Author.
PY - 2024/10/28
Y1 - 2024/10/28
N2 - Most existing methods for weakly supervised video moment localization use rule-based negative proposals. However, the rule-based ones have a limitation in capturing various confusing locations throughout the entire video. To alleviate the limitation, we propose learning-based negative proposals which are trained using a dual-signed cross-entropy loss. The dual-signed cross-entropy loss is controlled by a weight that changes gradually from a minus value to a plus one. The minus value makes the negative proposals be trained to capture query-irrelevant temporal boundaries (easy negative) in the earlier training stages, whereas the plus one makes them capture somewhat query-relevant temporal boundaries (hard negative) in the later training stages. To evaluate the quality of negative proposals, we introduce a new evaluation metric to measure how well a negative proposal captures a poorly-generated positive proposal. We verify that our negative proposals can be applied with negligible additional parameters and inference costs, achieving state-of-the-art performance on three public datasets.
AB - Most existing methods for weakly supervised video moment localization use rule-based negative proposals. However, the rule-based ones have a limitation in capturing various confusing locations throughout the entire video. To alleviate the limitation, we propose learning-based negative proposals which are trained using a dual-signed cross-entropy loss. The dual-signed cross-entropy loss is controlled by a weight that changes gradually from a minus value to a plus one. The minus value makes the negative proposals be trained to capture query-irrelevant temporal boundaries (easy negative) in the earlier training stages, whereas the plus one makes them capture somewhat query-relevant temporal boundaries (hard negative) in the later training stages. To evaluate the quality of negative proposals, we introduce a new evaluation metric to measure how well a negative proposal captures a poorly-generated positive proposal. We verify that our negative proposals can be applied with negligible additional parameters and inference costs, achieving state-of-the-art performance on three public datasets.
KW - dual-signed cross-entropy loss
KW - evaluation metric
KW - learning-based negative proposal
KW - video moment localization
UR - https://www.scopus.com/pages/publications/85209813089
U2 - 10.1145/3664647.3681304
DO - 10.1145/3664647.3681304
M3 - Conference contribution
AN - SCOPUS:85209813089
T3 - MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia
SP - 5318
EP - 5327
BT - MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia
PB - Association for Computing Machinery, Inc
T2 - 32nd ACM International Conference on Multimedia, MM 2024
Y2 - 28 October 2024 through 1 November 2024
ER -