Abstract
Zero-Shot Anomaly Detection (ZSAD) aims to identify anomalies in unseen categories or scenarios. Recently, Vision-Language Models (VLMs), most notably CLIP, have been utilized to enhance anomaly detection performance. However, CLIP struggles to capture local anomalies, which has led to the development of additional modules that significantly increase model complexity and computational overhead. To address this challenge, we propose AULoRA, a novel approach that enhances anomaly understanding by integrating Low Rank Adaptation (LoRA) into CLIP’s visual encoder and efficiently injecting visual context into the textual representation. While preserving CLIP’s general visual knowledge, we utilize Singular Value Decomposition (SVD) to selectively fine-tune only the most relevant singular components, enabling precise identification of semantic anomalies. Nevertheless, anomaly detection often requires capturing highly diverse and category-specific characteristics, which simple text prompts alone struggle to represent adequately. To overcome this, we adapt textual representations based on the visual context extracted from input images, allowing the model to achieve category-aware and anomaly sensitive alignment. AULoRA maintains the original architecture and inference efficiency of CLIP, while achieving state-of-the-art performance on both image-level and pixel-level anomaly detection benchmarks across diverse industrial datasets.
| Original language | English |
|---|---|
| Pages (from-to) | 170095-170105 |
| Number of pages | 11 |
| Journal | IEEE Access |
| Volume | 13 |
| DOIs | |
| State | Published - 2025 |
Keywords
- CLIP
- Zero-shot anomaly detection
- low-rank adaptation
- parameter-efficient fine-tuning
- singular value decomposition
Fingerprint
Dive into the research topics of 'AULoRA: Anomaly Understanding With Low-Rank Adaptation for Zero-Shot Anomaly Detection'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver