DiscoverDaily Paper CastAre Vision-Language Models Truly Understanding Multi-vision Sensor?
Are Vision-Language Models Truly Understanding Multi-vision Sensor?

Are Vision-Language Models Truly Understanding Multi-vision Sensor?

Update: 2025-01-03
Share

Description

🤗 Upvotes: 9 | cs.CV



Authors:

Sangyun Chung, Youngjoon Yu, Youngchae Chee, Se Yeon Kim, Byung-Kwan Lee, Yong Man Ro



Title:

Are Vision-Language Models Truly Understanding Multi-vision Sensor?



Arxiv:

http://arxiv.org/abs/2412.20750v1



Abstract:

Large-scale Vision-Language Models (VLMs) have advanced by aligning vision inputs with text, significantly improving performance in computer vision tasks. Moreover, for VLMs to be effectively utilized in real-world applications, an understanding of diverse multi-vision sensor data, such as thermal, depth, and X-ray information, is essential. However, we find that current VLMs process multi-vision sensor images without deep understanding of sensor information, disregarding each sensor's unique physical properties. This limitation restricts their capacity to interpret and respond to complex questions requiring multi-vision sensor reasoning. To address this, we propose a novel Multi-vision Sensor Perception and Reasoning (MS-PR) benchmark, assessing VLMs on their capacity for sensor-specific reasoning. Moreover, we introduce Diverse Negative Attributes (DNA) optimization to enable VLMs to perform deep reasoning on multi-vision sensor tasks, helping to bridge the core information gap between images and sensor data. Extensive experimental results validate that the proposed DNA method can significantly improve the multi-vision sensor reasoning for VLMs.

Comments 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Are Vision-Language Models Truly Understanding Multi-vision Sensor?

Are Vision-Language Models Truly Understanding Multi-vision Sensor?

Jingwen Liang, Gengyu Wang