SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

Update: 2025-12-31

Description

🤗 Upvotes: 33 | cs.CL, cs.AI, cs.CV, cs.LG, cs.MA

Authors:

Shaofei Cai, Yulei Qin, Haojia Lin, Zihan Xu, Gang Li, Yuchen Shi, Zongyi Li, Yong Mao, Siqi Cai, Xiaoyu Tan, Yitao Liang, Ke Li, Xing Sun

Title:

SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

Arxiv:

http://arxiv.org/abs/2512.22322v1

Abstract:

Agentic reinforcement learning (RL) holds great promise for the development of autonomous agents under complex GUI tasks, but its scalability remains severely hampered by the verification of task completion. Existing task verification is treated as a passive, post-hoc process: a verifier (i.e., rule-based scoring script, reward or critic model, and LLM-as-a-Judge) analyzes the agent's entire interaction trajectory to determine if the agent succeeds. Such processing of verbose context that contains irrelevant, noisy history poses challenges to the verification protocols and therefore leads to prohibitive cost and low reliability. To overcome this bottleneck, we propose SmartSnap, a paradigm shift from this passive, post-hoc verification to proactive, in-situ self-verification by the agent itself. We introduce the Self-Verifying Agent, a new type of agent designed with dual missions: to not only complete a task but also to prove its accomplishment with curated snapshot evidences. Guided by our proposed 3C Principles (Completeness, Conciseness, and Creativity), the agent leverages its accessibility to the online environment to perform self-verification on a minimal, decisive set of snapshots. Such evidences are provided as the sole materials for a general LLM-as-a-Judge verifier to determine their validity and relevance. Experiments on mobile tasks across model families and scales demonstrate that our SmartSnap paradigm allows training LLM-driven agents in a scalable manner, bringing performance gains up to 26.08% and 16.66% respectively to 8B and 30B models. The synergizing between solution finding and evidence seeking facilitates the cultivation of efficient, self-verifying agents with competitive performance against DeepSeek V3.1 and Qwen3-235B-A22B.

Comments

In Channel

mHC: Manifold-Constrained Hyper-Connections

2026-01-0220:57

Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models

2026-01-0228:35

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

2026-01-0225:58

GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction

2026-01-0222:28

Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

2025-12-3124:49

LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

2025-12-3123:16

Yume-1.5: A Text-Controlled Interactive World Generation Model

2025-12-3125:01

SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

2025-12-3124:01

Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation

2025-12-3125:32

Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion

2025-12-3125:06

Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone

2025-12-3123:48

SpotEdit: Selective Region Editing in Diffusion Transformers

2025-12-3122:44

GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models

2025-12-3122:03

InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion

2025-12-3023:11

Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding

2025-12-3021:17

MAI-UI Technical Report: Real-World Centric Foundation GUI Agents

2025-12-3024:59

Latent Implicit Visual Reasoning

2025-12-2725:49

Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

2025-12-2726:01

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

2025-12-2621:22

Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models

2025-12-2622:56

00:00

SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

Jingwen Liang, Gengyu Wang

#box-pro-ellipsis-176739783900458{-webkit-line-clamp:2;}SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

Jingwen Liang, Gengyu Wang

SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents