DiscoverHCI Deep DivesAHs 2025 GazeLLM: Multimodal LLMs incorporating Human Visual Attention
AHs 2025 GazeLLM: Multimodal LLMs incorporating Human Visual Attention

AHs 2025 GazeLLM: Multimodal LLMs incorporating Human Visual Attention

Update: 2025-12-27
Share

Description

Processing high-resolution video with AI requires massive computational resources. GazeLLM offers an elegant solution inspired by human vision: use eye-tracking to focus only on what matters. By cropping first-person video to a small region around the user's gaze point, the system reduces pixel input to just one-tenth while achieving task comprehension equal to or better than full-resolution video. User evaluations across six real-world activities—cooking, bike repair, first aid, and sports—showed that gaze-focused video produces higher quality task descriptions than both full videos and center-cropped alternatives.



Jun Rekimoto. 2025. GazeLLM: Multimodal LLMs incorporating Human Visual Attention. In Proceedings of the Augmented Humans International Conference 2025 (AHs '25). Association for Computing Machinery, New York, NY, USA, 10 pages. https://doi.org/10.1145/3745900.3746075

Comments 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

AHs 2025 GazeLLM: Multimodal LLMs incorporating Human Visual Attention

AHs 2025 GazeLLM: Multimodal LLMs incorporating Human Visual Attention

Kai Kunze