Enhancing Visual Classification Using Comparative Descriptors

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The performance of vision-language models (VLMs), such as CLIP, in visual classification tasks, has been enhanced by leveraging semantic knowledge from large language models (LLMs), including GPT. Recent studies have shown that in zero-shot classification tasks, descriptors incorporating additional cues, high-level concepts, or even random characters often outperform those using only category names. In many classification tasks, while the top-1 accuracy may be relatively low, the top-5 accuracy is often significantly higher. This gap implies that most misclassifications occur among a few similar classes, highlighting the model's difficulty in distinguishing between classes with subtle differences. To address this challenge, we introduce a novel concept of comparative descriptors. These descriptors emphasize the unique features of a target class against its most similar classes, enhancing differentiation. By generating and integrating these comparative descriptors into the classification framework, we refine the semantic focus and improve classification accuracy. An additional filtering process ensures that these descriptors are closer to the image embeddings in the CLIP space, further enhancing performance. Our approach demonstrates improved accuracy and robustness in visual classification tasks by addressing the specific challenge of subtle inter-class differences. Code is available at ht tps: //github.com/hklee

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5274-5283
Number of pages10
ISBN (Electronic)9798331510831
DOIs
StatePublished - 2025
Event2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025 - Tucson, United States
Duration: 28 Feb 20254 Mar 2025

Publication series

NameProceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025

Conference

Conference2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025
Country/TerritoryUnited States
CityTucson
Period28/02/254/03/25

Fingerprint

Dive into the research topics of 'Enhancing Visual Classification Using Comparative Descriptors'. Together they form a unique fingerprint.

Cite this