Abstract
The performance of vision-language models (VLMs), such as CLIP, in visual classification tasks, has been enhanced by leveraging semantic knowledge from large language models (LLMs), including GPT. Recent studies have shown that in zero-shot classification tasks, descriptors incorporating additional cues, high-level concepts, or even random characters often outperform those using only category names. In many classification tasks, while the top-1 accuracy may be relatively low, the top-5 accuracy is often significantly higher. This gap implies that most misclassifications occur among a few similar classes, highlighting the model's difficulty in distinguishing between classes with subtle differences. To address this challenge, we introduce a novel concept of comparative descriptors. These descriptors emphasize the unique features of a target class against its most similar classes, enhancing differentiation. By generating and integrating these comparative descriptors into the classification framework, we refine the semantic focus and improve classification accuracy. An additional filtering process ensures that these descriptors are closer to the image embeddings in the CLIP space, further enhancing performance. Our approach demonstrates improved accuracy and robustness in visual classification tasks by addressing the specific challenge of subtle inter-class differences. Code is available at ht tps: //github.com/hklee
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 5274-5283 |
| Number of pages | 10 |
| ISBN (Electronic) | 9798331510831 |
| DOIs | |
| State | Published - 2025 |
| Event | 2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025 - Tucson, United States Duration: 28 Feb 2025 → 4 Mar 2025 |
Publication series
| Name | Proceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025 |
|---|
Conference
| Conference | 2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025 |
|---|---|
| Country/Territory | United States |
| City | Tucson |
| Period | 28/02/25 → 4/03/25 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Fingerprint
Dive into the research topics of 'Enhancing Visual Classification Using Comparative Descriptors'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver