Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders

Update: 2025-01-11

Description

We propose Gaze-LLE, a transformer framework for gaze target estimation, utilizing a frozen DINOv2 encoder for streamlined feature extraction, achieving state-of-the-art performance across multiple benchmarks.

https://arxiv.org/abs//2412.09586

YouTube: https://www.youtube.com/@ArxivPapers

TikTok: https://www.tiktok.com/@arxiv_papers

Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016

Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

Comments

In Channel

[QA] On the Theoretical Limitations of Embedding-Based Retrieval

2025-09-0108:55

On the Theoretical Limitations of Embedding-Based Retrieval

2025-09-0123:17

[QA] Beyond GPT-5: Making LLMs Cheaper and Better via Performance–Efficiency Optimized Routing

2025-08-2207:03

Beyond GPT-5: Making LLMs Cheaper and Better via Performance–Efficiency Optimized Routing

2025-08-2209:39

[QA] Measuring the environmental impact of delivering AI at Google Scale

2025-08-2208:17

Measuring the environmental impact of delivering AI at Google Scale

2025-08-2222:09

[QA] Deep Think with Confidence

2025-08-2207:36

Deep Think with Confidence

2025-08-2218:34

[QA] Intern-S1: A Scientific Multimodal Foundation Model

2025-08-2208:33

Intern-S1: A Scientific Multimodal Foundation Model

2025-08-2249:42

[QA] Search-Time Data Contamination

2025-08-2007:02

Search-Time Data Contamination

2025-08-2019:34

[QA] Thyme: Think Beyond Images

2025-08-1907:20

Thyme: Think Beyond Images

2025-08-1925:37

[QA] SSRL: Self-Search Reinforcement Learning

2025-08-1907:39

SSRL: Self-Search Reinforcement Learning

2025-08-1932:32

[QA] Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs

2025-08-1407:19

Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs

2025-08-1431:24

[QA] Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL

2025-08-1407:42

Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL

2025-08-1428:28

00:00

Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders

#box-pro-ellipsis-176185684216486{-webkit-line-clamp:2;}Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders

Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders

Igor Melnyk

Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders