HuggingFace 每日AI论文速递

543 Episodes

Reverse

【周末特辑】3月第3周最火AI论文 | 几何强化3D编辑；LLM视觉编码轻量飞跃

2026-03-1510:17

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 5 篇论文如下：[00:50] TOP1(🔥136) | 🎨 Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing（几何引导的强化学习用于多视角一致的3D场景编辑）[02:57] TOP2(🔥104) | 🐧 Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders（Penguin-VL：探索基于LLM视觉编码器的VLM效率极限）[04:30] TOP3(🔥90) | 🤖 OpenClaw-RL: Train Any Agent Simply by Talking（OpenClaw-RL：通过对话训练任意智能体）[06:24] TOP4(🔥81) | 📖 Lost in Stories: Consistency Bugs in Long Story Generation by LLMs（迷失于故事：大语言模型生成长篇故事中的一致性错误）[08:02] TOP5(🔥77) | 🧠 Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence（Holi-Spatial：将视频流演化为整体的3D空间智能）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

2026.03.13 | 流式空间记忆2B小模型逆袭；AI“蛮力”翻页不敌人类策略

2026-03-1413:48

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下：[00:32] 🧠 Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training（Spatial-TTT：基于测试时训练的流式视觉空间智能）[01:17] 🤔 Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections（策略性导航还是随机搜索？智能体与人类在文档集合上的推理方式研究）[02:11] ⚡ IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse（IndexCache：通过跨层索引复用加速稀疏注意力）[02:54] 🎬 Video-Based Reward Modeling for Computer-Use Agents（基于视频的计算机使用智能体奖励建模）[03:55] 🎬 DreamVideo-Omni: Omni-Motion Controlled Multi-Subject Video Customization with Latent Identity Reinforcement Learning（DreamVideo-Omni：基于潜在身份强化学习的全运动控制多主体视频定制）[04:46] 🎯 Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation（信任你的评判者：用于忠实图像编辑与生成的鲁棒奖励建模与强化学习）[05:40] 🎬 DVD: Deterministic Video Depth Estimation with Generative Priors（DVD：基于生成先验的确定性视频深度估计）[06:29] 🖼 WeEdit: A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing（WeEdit：面向文本中心图像编辑的数据集、基准与字形引导框架）[07:29] 🎬 ShotVerse: Advancing Cinematic Camera Control for Text-Driven Multi-Shot Video Creation（ShotVerse：面向文本驱动多镜头视频创作的电影级摄像机控制技术）[08:24] 🧠 GRADE: Benchmarking Discipline-Informed Reasoning in Image Editing（GRADE：基准测试学科知识驱动的图像编辑推理能力）[09:08] 🎬 EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation（EVATok：面向高效视觉自回归生成的自适应长度视频分词）[09:55] ⚡ One Model, Many Budgets: Elastic Latent Interfaces for Diffusion Transformers（一模型，多预算：用于扩散变换器的弹性潜在接口）[10:46] 🤖 OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams（OmniStream：在连续流中掌握感知、重建与行动）[11:29] 🧠 EndoCoT: Scaling Endogenous Chain-of-Thought Reasoning in Diffusion Models（EndoCoT：在扩散模型中扩展内生思维链推理）[12:37] 🧠 XSkill: Continual Learning from Experience and Skills in Multimodal Agents（XSkill：多模态智能体从经验与技能中的持续学习）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

2026.03.12 | 边聊边训智能体；GPU秒解亿级K均

2026-03-1212:22

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下：[00:29] 🤖 OpenClaw-RL: Train Any Agent Simply by Talking（OpenClaw-RL：通过对话训练任意智能体）[01:17] ⚡ Flash-KMeans: Fast and Memory-Efficient Exact K-Means（Flash-KMeans：快速且内存高效的精确K-Means算法）[02:01] 👁 MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents（MA-EgoQA：基于多具身智能体第一人称视角视频的问答）[02:43] 🧠 In-Context Reinforcement Learning for Tool Use in Large Language Models（大语言模型中工具使用的上下文强化学习）[03:19] 🧠 ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning（ReMix：基于强化学习的LoRA混合路由用于大语言模型微调）[04:10] 📊 Can Large Language Models Keep Up? Benchmarking Online Adaptation to Continual Knowledge Streams（大型语言模型能否跟上？在线适应持续知识流的基准测试）[05:00] 🧠 RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback（RetroAgent：通过回顾性双重内在反馈实现从解决问题到持续进化）[05:50] 🔬 CodePercept: Code-Grounded Visual STEM Perception for MLLMs（CodePercept：基于代码的多模态大语言模型视觉STEM感知）[06:44] 🎯 Prism-$Δ$: Differential Subspace Steering for Prompt Highlighting in Large Language Models（Prism-Δ：面向大语言模型提示高亮的差分子空间导向方法）[07:31] 🧠 LLM2Vec-Gen: Generative Embeddings from Large Language Models（LLM2Vec-Gen：基于大语言模型的生成式嵌入方法）[08:22] ⚖ $V_{0.5}$: Generalist Value Model as a Prior for Sparse RL Rollouts（V_{0.5}：作为稀疏强化学习rollouts先验的通用价值模型）[09:05] ⚡ Just-in-Time: Training-Free Spatial Acceleration for Diffusion Transformers（即时：无需训练的空间加速方法用于扩散Transformer）[09:47] 🧠 Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning（强化学习中利用群体级自然语言反馈引导探索）[10:39] 💬 RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation（RbtAct：以反驳作为监督的可操作审稿反馈生成）[11:14] 🧠 Hindsight Credit Assignment for Long-Horizon LLM Agents（面向长视野LLM智能体的后见之明信用分配）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

2026.03.11 | 几何强化3D编辑；掩码扩散多模态

2026-03-1213:01

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下：[00:32] 🎨 Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing（几何引导的强化学习用于多视角一致的3D场景编辑）[01:11] 🔄 Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion（Omni-Diffusion：基于掩码离散扩散的统一多模态理解与生成）[02:06] 🧠 Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs（思考以回忆：推理如何解锁大语言模型中的参数化知识）[02:55] 🚀 MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data（MM-Zero：从零数据自演进的多模态视觉语言模型）[03:41] 🧠 InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing（InternVL-U：民主化统一多模态模型，实现理解、推理、生成与编辑）[04:34] 🏸 Stepping VLMs onto the Court: Benchmarking Spatial Intelligence in Sports（让视觉语言模型踏上赛场：体育场景空间智能基准测试）[05:15] 🔍 Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs（阅读而非思考：理解并弥合多模态大语言模型中文本像素化时的模态鸿沟）[06:01] 🗣 Fish Audio S2 Technical Report（Fish Audio S2 技术报告）[06:48] 🎧 Are Audio-Language Models Listening? Audio-Specialist Heads for Adaptive Audio Steering（音频语言模型在聆听吗？用于自适应音频引导的音频专家注意力头）[07:45] 📱 MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants（MiniAppBench：评估LLM驱动助手中从文本到交互式HTML响应的转变）[08:48] 🔍 VLM-SubtleBench: How Far Are VLMs from Human-Level Subtle Comparative Reasoning?（VLM-SubtleBench：视觉语言模型距离人类级别的细微比较推理还有多远？）[09:34] 🗣 Do What I Say: A Spoken Prompt Dataset for Instruction-Following（按我说的做：一个用于指令跟随的语音提示数据集）[10:20] 🎬 Streaming Autoregressive Video Generation via Diagonal Distillation（通过对角线蒸馏实现流式自回归视频生成）[11:08] 🧪 Test-Driven AI Agent Definition (TDAD): Compiling Tool-Using Agents from Behavioral Specifications（测试驱动AI智能体定义（TDAD）：从行为规范编译工具使用型智能体）[11:58] ⚖ Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards（解耦推理与置信度：在可验证奖励的强化学习中重建校准）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

2026.03.10 | 长故事一致性漏洞扫描；零人工3D空间智能标注

2026-03-1013:48

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下：[00:32] 📖 Lost in Stories: Consistency Bugs in Long Story Generation by LLMs（迷失于故事：大语言模型生成长篇故事中的一致性错误）[01:16] 🧠 Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence（Holi-Spatial：将视频流演化为整体的3D空间智能）[02:17] 📈 How Far Can Unsupervised RLVR Scale LLM Training?（无监督强化学习验证奖励能将LLM训练扩展到何种程度？）[03:11] 📊 Believe Your Model: Distribution-Guided Confidence Calibration（相信你的模型：基于分布引导的置信度校准）[04:12] 🧠 LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory（LoGeR：基于混合内存的长上下文几何重建）[05:07] 🎨 CARE-Edit: Condition-Aware Routing of Experts for Contextual Image Editing（CARE-Edit：基于条件感知专家路由的上下文图像编辑）[05:51] 💻 CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation（CoCo：将代码作为思维链用于文本到图像预览与稀有概念生成）[06:30] 🎬 HiAR: Efficient Autoregressive Long Video Generation via Hierarchical Denoising（HiAR：通过分层去噪实现高效的自回归长视频生成）[07:36] 📊 \$OneMillion-Bench: How Far are Language Agents from Human Experts?（OneMillion-Bench：语言智能体距离人类专家还有多远？）[08:24] ⚡ NLE: Non-autoregressive LLM-based ASR by Transcript Editing（NLE：基于转录编辑的非自回归大语言模型语音识别）[09:17] 🧠 Concept-Guided Fine-Tuning: Steering ViTs away from Spurious Correlations to Improve Robustness（概念引导的微调：引导视觉Transformer远离虚假相关性以提升鲁棒性）[10:03] 🚀 TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward（TDM-R1：利用不可微奖励增强少步扩散模型）[11:02] 📈 Unlocking Data Value in Finance: A Study on Distillation and Difficulty-Aware Training（解锁金融数据价值：关于蒸馏与难度感知训练的研究）[11:40] 🤖 Scaling Agentic Capabilities, Not Context: Efficient Reinforcement Finetuning for Large Toolspaces（扩展智能体能力，而非上下文：面向大规模工具空间的高效强化微调）[12:36] 🔍 PIRA-Bench: A Transition from Reactive GUI Agents to GUI-based Proactive Intent Recommendation Agents（PIRA-Bench：从反应式GUI代理到基于GUI的主动意图推荐代理的转变）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

2026.03.09 | LLM做视觉编码器；BandPO剪得更聪明

2026-03-0913:06

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下：[00:34] 🐧 Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders（Penguin-VL：探索基于LLM视觉编码器的VLM效率极限）[01:16] 🚀 BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning（BandPO：通过概率感知边界桥接信任区域与比率裁剪以用于大语言模型强化学习）[02:02] ⚡ Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model（8个令牌的规划：用于潜在世界模型的紧凑离散分词器）[02:43] 🚀 Progressive Residual Warmup for Language Model Pretraining（语言模型预训练的渐进残差预热方法）[03:41] 🎬 WildActor: Unconstrained Identity-Preserving Video Generation（WildActor：无约束身份保持的视频生成）[04:38] 🧠 RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies（RoboMME：机器人通用策略的记忆基准测试与理解）[05:31] 🤔 Reasoning Models Struggle to Control their Chains of Thought（推理模型难以控制其思维链）[06:13] 🧭 HiMAP-Travel: Hierarchical Multi-Agent Planning for Long-Horizon Constrained Travel（HiMAP-Travel：面向长时域约束旅行的分层多智能体规划）[06:59] ⚡ FlashPrefill: Instantaneous Pattern Discovery and Thresholding for Ultra-Fast Long-Context Prefilling（FlashPrefill：面向超快速长上下文预填充的即时模式发现与阈值化方法）[07:49] 🚀 $π$-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs（π-StepNFT：在线强化学习中，基于流的视觉语言动作模型需要更精细的步骤以适应更广的空间）[08:32] 🧠 Mario: Multimodal Graph Reasoning with Large Language Models（Mario：基于大语言模型的多模态图推理）[09:22] 🎬 Physical Simulator In-the-Loop Video Generation（物理模拟器在环视频生成）[10:14] 🧩 Dynamic Chunking Diffusion Transformer（动态分块扩散变换器）[11:05] 🔄 SLER-IR: Spherical Layer-wise Expert Routing for All-in-One Image Restoration（SLER-IR：面向一体化图像修复的球面分层专家路由框架）[11:50] 🧊 PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction（PixARMesh：基于自回归网格原生模型的单视角场景重建）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

【周末特辑】3月第2周最火AI论文 | 统一编码器跨域点云；异构模型协作省样本

2026-03-0813:47

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 5 篇论文如下：[00:40] TOP1(🔥143) | 🧩 Utonia: Toward One Encoder for All Point Clouds（Utonia：迈向适用于所有点云的统一编码器）[03:19] TOP2(🔥141) | 🤝 Heterogeneous Agent Collaborative Reinforcement Learning（异构智能体协作强化学习）[05:46] TOP3(🔥134) | 🎨 OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens（OmniLottie：通过参数化Lottie令牌生成矢量动画）[08:15] TOP4(🔥133) | 🎬 Helios: Real Real-Time Long Video Generation Model（Helios：实时长视频生成模型）[10:50] TOP5(🔥130) | ⚡ From Scale to Speed: Adaptive Test-Time Scaling for Image Editing（从规模到速度：图像编辑的自适应测试时扩展）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

【月末特辑】2月最火AI论文 | VBVR百万视频炼视觉推理；OPUS同频优化器省算力

2026-03-0722:33

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 10 篇论文如下：[00:44] TOP1(🔥508) | 🧠 A Very Big Video Reasoning Suite（一个超大规模视频推理套件）[03:11] TOP2(🔥343) | 🚀 OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration（OPUS：迈向大规模语言模型预训练中高效且原理化的逐轮数据选择）[05:13] TOP3(🔥312) | 🤖 Green-VLA: Staged Vision-Language-Action Model for Generalist Robots（Green-VLA：面向通用机器人的分阶段视觉-语言-动作模型）[07:12] TOP4(🔥278) | 📈 Weak-Driven Learning: How Weak Agents make Strong Agents Stronger（弱驱动学习：弱智能体如何使强智能体更强）[09:36] TOP5(🔥262) | 🧠 ERNIE 5.0 Technical Report（ERNIE 5.0 技术报告）[11:23] TOP6(🔥259) | 💭 Does Your Reasoning Model Implicitly Know When to Stop Thinking?（你的推理模型是否隐含地知道何时停止思考？）[13:34] TOP7(🔥254) | 🤖 Kimi K2.5: Visual Agentic Intelligence（Kimi K2.5：视觉智能体）[15:14] TOP8(🔥240) | 🧠 Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs（少即是够：在大型语言模型特征空间中合成多样化数据）[17:24] TOP9(🔥216) | ⚖ VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training（VESPO：用于稳定离策略LLM训练的变分序列级软策略优化）[19:30] TOP10(🔥213) | 🍌 PaperBanana: Automating Academic Illustration for AI Scientists（PaperBanana：面向AI科学家的学术插图自动化生成框架）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

2026.03.06 | MOOSE-Star打破科学发现训练壁垒；DARE让LLM秒变严谨统计助手

2026-03-0612:41

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下：[00:32] 🚀 MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier（MOOSE-Star：通过打破复杂性壁垒解锁科学发现的可处理训练）[01:50] 📊 DARE: Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware Retrieval（DARE：通过分布感知检索实现LLM智能体与R统计生态系统的对齐）[02:39] 🧠 SkillNet: Create, Evaluate, and Connect AI Skills（SkillNet：创建、评估与连接AI技能）[03:28] 📱 RoboPocket: Improve Robot Policies Instantly with Your Phone（RoboPocket：用手机即时提升机器人策略）[04:15] 🎨 HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images（HiFi-Inpaint：面向高保真参考的图像修复，用于生成细节保留的人-物图像）[04:59] 🔍 AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios（AgentVista：在超挑战性真实视觉场景中评估多模态智能体）[05:37] 🔬 SageBwd: A Trainable Low-bit Attention（SageBwd：一种可训练的低比特注意力机制）[06:21] 🧠 Large Multimodal Models as General In-Context Classifiers（大型多模态模型作为通用上下文分类器）[07:01] ⚖ MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models（MASQuant：面向多模态大语言模型的模态感知平滑量化）[07:54] 🌍 DreamWorld: Unified World Modeling in Video Generation（DreamWorld：视频生成中的统一世界建模）[08:34] 🎬 RealWonder: Real-Time Physical Action-Conditioned Video Generation（RealWonder：基于物理仿真的实时动作条件视频生成）[09:35] 🧠 Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline（迈向多模态终身理解：数据集与智能体基线）[10:14] 🧠 On-Policy Self-Distillation for Reasoning Compression（基于策略自蒸馏的推理压缩方法）[10:55] 🤖 KARL: Knowledge Agents via Reinforcement Learning（KARL：基于强化学习的知识智能体）[11:39] 🔍 Locality-Attending Vision Transformer（局部性感知视觉Transformer）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

2026.03.05 | Helios无限续写长视频；异构模型协同减半刷题

2026-03-0512:28

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下：[00:33] 🎬 Helios: Real Real-Time Long Video Generation Model（Helios：实时长视频生成模型）[01:12] 🤝 Heterogeneous Agent Collaborative Reinforcement Learning（异构智能体协作强化学习）[01:56] 🧠 T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning（T2S-Bench与思维结构：全面文本到结构推理的基准测试与提示技术）[02:50] 🤖 Proact-VL: A Proactive VideoLLM for Real-Time AI Companions（Proact-VL：面向实时AI伴侣的主动视频语言模型）[03:28] 🧠 MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning（MemSifter：通过结果驱动的代理推理卸载LLM记忆检索）[04:20] 🤖 ArtHOI: Articulated Human-Object Interaction Synthesis by 4D Reconstruction from Video Priors（ArtHOI：基于视频先验4D重建的关节化人-物交互合成）[05:12] 🎥 CubeComposer: Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video（CubeComposer：基于透视视频的时空自回归4K 360°视频生成）[05:51] 🧠 Phi-4-reasoning-vision-15B Technical Report（Phi-4推理视觉-15B技术报告）[06:41] 🧠 Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory（Memex(RL)：通过索引化经验记忆扩展长程LLM智能体）[07:20] 🔍 AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models（AgilePruner：针对大型视觉语言模型中自适应视觉令牌剪枝的注意力与多样性实证研究）[08:12] 🎬 RIVER: A Real-Time Interaction Benchmark for Video LLMs（RIVER：面向视频大语言模型的实时交互基准）[08:51] 🎬 InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions（InfinityStory：具有世界一致性和角色感知镜头转换的无限制视频生成）[09:43] 🧠 EmbodiedSplat: Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene Understanding（EmbodiedSplat：面向开放词汇3D场景理解的在线前馈语义3D高斯泼溅）[10:32] 🧠 BeamPERL: Parameter-Efficient RL with Verifiable Rewards Specializes Compact LLMs for Structured Beam Mechanics Reasoning（BeamPERL：基于可验证奖励的参数高效强化学习使紧凑型大语言模型专精于结构化梁力学推理）[11:34] 🔄 SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration（SWE-CI：通过持续集成评估智能体在代码库维护中的能力）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

2026.03.04 | 统一模型“对齐税”拖累理解；通用点云编码器一锅端多场景

2026-03-0412:59

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下：[00:32] 🔍 UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?（UniG2U-Bench：统一模型是否推动了多模态理解的发展？）[01:40] 🧩 Utonia: Toward One Encoder for All Point Clouds（Utonia：迈向适用于所有点云的统一编码器）[02:21] 🔍 BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing?（超越SWE：当前代码智能体能否在单仓库缺陷修复之外生存？）[03:00] 🔍 Beyond Language Modeling: An Exploration of Multimodal Pretraining（超越语言建模：多模态预训练的探索）[03:53] 🧠 Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models（超越长度缩放：融合广度与深度以优化生成式奖励模型）[04:40] 🎯 How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities（大型语言模型的可控性如何？跨行为粒度的统一评估）[05:16] 🎬 Kling-MotionControl Technical Report（Kling-MotionControl技术报告）[05:58] 🎬 Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance（Kiwi-Edit：基于指令与参考引导的通用视频编辑）[07:01] 🤖 Qwen3-Coder-Next Technical Report（Qwen3-Coder-Next技术报告）[07:46] 🧠 PRISM: Pushing the Frontier of Deep Think via Process Reward Model-Guided Inference（PRISM：通过过程奖励模型引导的推理推动深度思考前沿）[08:30] 🔍 InfoPO: Information-Driven Policy Optimization for User-Centric Agents（InfoPO：面向用户中心智能体的信息驱动策略优化）[09:29] 🔬 Surgical Post-Training: Cutting Errors, Keeping Knowledge（手术式后训练：精准修正错误，稳固保留知识）[10:14] 🎛 CFG-Ctrl: Control-Based Classifier-Free Diffusion Guidance（CFG-Ctrl：基于控制的Classifier-Free扩散引导）[10:53] 🎬 NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing（NOVA：稀疏控制与密集合成的无配对视频编辑框架）[11:58] ⚡ Spilled Energy in Large Language Models（大语言模型中的能量溢出）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

2026.03.03 | 自适应扩展省算力；令牌秒变动效

2026-03-0313:07

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下：[00:30] ⚡ From Scale to Speed: Adaptive Test-Time Scaling for Image Editing（从规模到速度：图像编辑的自适应测试时扩展）[01:16] 🎨 OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens（OmniLottie：通过参数化Lottie令牌生成矢量动画）[01:57] 🤖 OpenAutoNLU: Open Source AutoML Library for NLU（OpenAutoNLU：面向自然语言理解的开源自动机器学习库）[02:37] 🧩 MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning（MMR-Life：拼凑真实生活场景以实现多模态多图像推理）[03:32] 📊 RubricBench: Aligning Model-Generated Rubrics with Human Standards（RubricBench：对齐模型生成的评分标准与人类标准）[04:16] 🧠 CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning（CHIMERA：用于通用大语言模型推理的紧凑合成数据集）[05:04] 🔍 VGGT-Det: Mining VGGT Internal Priors for Sensor-Geometry-Free Multi-View Indoor 3D Object Detection（VGGT-Det：挖掘VGGT内部先验实现无需传感器几何的多视角室内3D目标检测）[06:08] 🤖 CoVe: Training Interactive Tool-Use Agents via Constraint-Guided Verification（CoVe：通过约束引导验证训练交互式工具使用智能体）[06:50] ⚙ SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale（SWE-rebench V2：大规模语言无关的软件工程任务集合）[07:37] 📊 Spectral Condition for $μ$P under Width-Depth Scaling（宽度-深度缩放下 $μ$P 的光谱条件）[08:21] 🎬 WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories（WorldStereo：通过3D几何记忆桥接相机引导视频生成与场景重建）[09:08] 🧠 LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model（LLaDA-o：一种高效且长度自适应的全能扩散模型）[10:11] 🧠 Efficient RLVR Training via Weighted Mutual Information Data Selection（基于加权互信息数据选择的高效RLVR训练方法）[10:48] 🧠 Learn Hard Problems During RL with Reference Guided Fine-tuning（通过参考引导微调在强化学习中学习难题）[11:51] 🔬 When Does RL Help Medical VLMs? Disentangling Vision, SFT, and RL Gains（强化学习何时助力医学视觉语言模型？解构视觉、监督微调与强化学习的增益）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

2026.03.02 | dLLM统一扩散框架；SpatialScore让AI读懂空间

2026-03-0212:53

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下：[00:29] 🛠 dLLM: Simple Diffusion Language Modeling（dLLM：简单的扩散语言建模）[01:15] 🧠 Enhancing Spatial Understanding in Image Generation via Reward Modeling（通过奖励建模增强图像生成中的空间理解）[02:11] 🌍 Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets（在翻译中恢复：自动化基准测试与数据集翻译的高效流程）[03:08] ⚡ CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation（CUDA Agent：用于高性能CUDA内核生成的大规模智能体强化学习系统）[03:59] 🎬 Mode Seeking meets Mean Seeking for Fast Long Video Generation（模式寻求与均值寻求相遇：实现快速长视频生成）[04:44] 🧩 Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models（组合泛化要求视觉嵌入模型具备线性正交表示）[05:31] ⚡ LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding（LK损失函数：用于推测解码的直接接受率优化）[06:21] 🔍 CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era（CiteAudit：你引用了它，但你读过吗？大语言模型时代科学参考文献验证基准）[07:16] ⚡ Accelerating Masked Image Generation by Learning Latent Controlled Dynamics（通过学习潜在控制动力学加速掩码图像生成）[08:00] 🧠 Memory Caching: RNNs with Growing Memory（记忆缓存：具有增长记忆能力的循环神经网络）[08:38] 📊 InfoNCE Induces Gaussian Distribution（InfoNCE诱导高斯分布）[09:28] 🧠 Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks（Ref-Adv：探索多模态大语言模型在指代表达任务中的视觉推理能力）[10:28] ⚡ SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching（SenCache：基于敏感度感知的缓存加速扩散模型推理）[11:15] 🎬 LongVideo-R1: Smart Navigation for Low-cost Long Video Understanding（LongVideo-R1：面向低成本长视频理解的智能导航）[11:53] ⚡ Vectorizing the Trie: Efficient Constrained Decoding for LLM-based Generative Retrieval on Accelerators（向量化字典树：面向加速器的高效约束解码用于基于LLM的生成式检索）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

【周末特辑】3月第1周最火AI论文 | VBVR 百万级视频基准刷新推理极限；SAGE 自信早停让模型省话又精准

2026-03-0112:29

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 5 篇论文如下：[00:39] TOP1(🔥491) | 🧠 A Very Big Video Reasoning Suite（一个超大规模视频推理套件）[02:33] TOP2(🔥246) | 💭 Does Your Reasoning Model Implicitly Know When to Stop Thinking?（你的推理模型是否隐含地知道何时停止思考？）[04:48] TOP3(🔥215) | ⚖ VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training（VESPO：用于稳定离策略LLM训练的变分序列级软策略优化）[07:29] TOP4(🔥187) | 🌍 The Trinity of Consistency as a Defining Principle for General World Models（一致性三位一体：作为通用世界模型定义原则）[09:43] TOP5(🔥146) | 🔍 From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models（从盲点到增益：诊断驱动的迭代训练用于大型多模态模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

2026.02.27 | 诊断补课反超72B；三一致性考趴世界模型

2026-02-2713:14

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下：[00:31] 🔍 From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models（从盲点到增益：诊断驱动的迭代训练用于大型多模态模型）[01:16] 🌍 The Trinity of Consistency as a Defining Principle for General World Models（一致性三位一体：作为通用世界模型定义原则）[01:49] 🧭 MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios（MobilityBench：一个用于评估现实世界移动场景中路径规划智能体的基准）[02:52] 🧠 OmniGAIA: Towards Native Omni-Modal AI Agents（OmniGAIA：迈向原生全模态人工智能体）[03:44] 🔍 Imagination Helps Visual Reasoning, But Not Yet in Latent Space（想象力助力视觉推理，但尚未在潜在空间中实现）[04:26] 🧠 Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization（基于混合在线与离线策略优化的探索性记忆增强大语言模型智能体）[05:26] 🛡 AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning（AgentDropoutV2：通过测试时修正或拒绝剪枝优化多智能体系统中的信息流）[06:18] 🔍 Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization（多搜索，少思考：重新思考长视野智能搜索的效率与泛化性）[06:54] 🩺 MediX-R1: Open Ended Medical Reinforcement Learning（MediX-R1：开放式医学强化学习框架）[07:42] ⚡ Accelerating Diffusion via Hybrid Data-Pipeline Parallelism Based on Conditional Guidance Scheduling（基于条件引导调度的混合数据-流水线并行加速扩散模型）[08:43] 🤖 EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied Agents（EmbodMocap：面向具身智能体的野外4D人-场景重建）[09:41] 🎮 AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games（AI游戏商店：通过人类游戏对机器通用智能进行可扩展、开放式评估）[10:26] 🚶 Causal Motion Diffusion Models for Autoregressive Motion Generation（因果运动扩散模型用于自回归运动生成）[11:09] ⚡ veScale-FSDP: Flexible and High-Performance FSDP at Scale（veScale-FSDP：大规模灵活且高性能的FSDP）[11:51] 🚗 Risk-Aware World Model Predictive Control for Generalizable End-to-End Autonomous Driving（面向可泛化端到端自动驾驶的风险感知世界模型预测控制）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

2026.02.26 | 分子图生成首破99%化学有效性；DreamID-Omni把多人脸音色混剪错配率砍到8%

2026-02-2612:40

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下：[00:31] ⚗ MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models（MolHIT：基于分层离散扩散模型推进分子图生成）[01:08] 🎭 DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation（DreamID-Omni：可控人本音视频生成统一框架）[01:49] 🧪 ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning（ARLArena：一个用于稳定智能体强化学习的统一框架）[02:40] ⚡ HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation（HyTRec：一种用于长行为序列推荐的混合时序感知注意力架构）[03:22] 🎬 SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model（SkyReels-V4：多模态视频-音频生成、修复与编辑模型）[04:10] 🎮 Solaris: Building a Multiplayer Video World Model in Minecraft（Solaris：在《我的世界》中构建多人视频世界模型）[05:20] 🤖 GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL（GUI-Libra：通过动作感知监督和部分可验证强化学习训练原生GUI智能体进行推理与行动）[06:19] 🎬 JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation（JavisDiT++：面向联合音视频生成的统一建模与优化）[07:11] 🌐 Image Generation with a Sphere Encoder（使用球面编码器的图像生成）[07:51] 🧭 World Guidance: World Modeling in Condition Space for Action Generation（世界引导：基于条件空间的世界建模用于动作生成）[08:31] 🔍 NanoKnow: How to Know What Your Language Model Knows（NanoKnow：如何知晓你的语言模型知道什么）[09:10] ⚡ DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference（DualPath：打破智能体化大语言模型推理中的存储带宽瓶颈）[10:11] 🧠 The Design Space of Tri-Modal Masked Diffusion Models（三模态掩码扩散模型的设计空间研究）[10:46] 🔤 VecGlypher: Unified Vector Glyph Generation with Language Models（VecGlypher：基于语言模型的统一矢量字形生成）[11:20] ⚡ SeaCache: Spectral-Evolution-Aware Cache for Accelerating Diffusion Models（SeaCache：一种用于加速扩散模型的频谱演化感知缓存）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

2026.02.25 | 数据工程赋能小模型；轻量重排刷新长文本SOTA

2026-02-2513:11

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下：[00:29] 🖥 On Data Engineering for Scaling LLM Terminal Capabilities（论扩展大型语言模型终端能力的数据工程）[01:20] 🧠 Query-focused and Memory-aware Reranker for Long Context Processing（面向长文本处理的查询聚焦与记忆感知重排序器）[02:12] 🔗 From Perception to Action: An Interactive Benchmark for Vision Reasoning（从感知到行动：视觉推理的交互式基准）[03:04] 🤖 PyVision-RL: Forging Open Agentic Vision Models via RL（PyVision-RL：通过强化学习锻造开放的智能体视觉模型）[03:52] 📊 LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces（LongCLI-Bench：命令行界面中长视野智能体编程的初步基准与研究）[04:41] 🔍 DREAM: Deep Research Evaluation with Agentic Metrics（DREAM：基于智能体指标的深度研究评估）[05:39] 📈 Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation（Conv-FinRe：面向效用驱动的金融推荐对话式与长期性基准）[06:49] ⚙ QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models（QuantVLA：面向视觉-语言-动作模型的尺度校准后训练量化）[07:35] 🤖 Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs（从试错中学习：具身大语言模型的反思性测试时规划）[08:20] 🚀 The Diffusion Duality, Chapter II: $Ψ$-Samplers and Efficient Curriculum（扩散对偶性第二章：Ψ采样器与高效课程学习）[09:05] 🧩 Communication-Inspired Tokenization for Structured Image Representations（面向结构化图像表征的通信启发式分词方法）[10:02] 🤖 Aletheia tackles FirstProof autonomously（Aletheia自主攻克首届FirstProof挑战）[10:42] ⚡ Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking（解绑的尤利西斯：通过注意力头分块实现内存高效上下文并行）[11:34] ⚡ The Art of Efficient Reasoning: Data, Reward, and Optimization（高效推理的艺术：数据、奖励与优化）[12:13] 🔒 Adaptive Text Anonymization: Learning Privacy-Utility Trade-offs via Prompt Optimization（自适应文本匿名化：通过提示优化学习隐私与效用的权衡）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

2026.02.24 | VBVR百万视频补推理教材；VLANeXt十二配方炼成VLA

2026-02-2412:11

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 14 篇论文如下：[00:31] 🧠 A Very Big Video Reasoning Suite（一个超大规模视频推理套件）[01:16] 🧪 VLANeXt: Recipes for Building Strong VLA Models（VLANeXt：构建强大视觉-语言-动作模型的实践指南）[02:06] 🧭 ManCAR: Manifold-Constrained Latent Reasoning with Adaptive Test-Time Computation for Sequential Recommendation（ManCAR：用于序列推荐的具有自适应测试时计算的流形约束潜在推理）[02:54] 🤖 TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics（TOPReward：将标记概率作为机器人学的隐藏零样本奖励）[03:45] 📱 Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device（Mobile-O：移动设备上的统一多模态理解与生成）[04:40] 🧠 DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning（DSDR：用于大语言模型推理探索的双尺度多样性正则化）[05:54] 🎯 Learning Cross-View Object Correspondence via Cycle-Consistent Mask Prediction（通过循环一致掩码预测学习跨视角物体对应关系）[06:44] 🎻 SkillOrchestra: Learning to Route Agents via Skill Transfer（SkillOrchestra：通过技能迁移学习路由智能体）[07:28] 🤖 RoboCurate: Harnessing Diversity with Action-Verified Neural Trajectory for Robot Learning（RoboCurate：利用动作验证神经轨迹的多样性进行机器人学习）[08:02] 🚀 K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model（K-Search：通过协同演化内在世界模型进行LLM内核生成）[08:43] 🤖 SimVLA: A Simple VLA Baseline for Robotic Manipulation（SimVLA：用于机器人操作的简单视觉-语言-动作基线）[09:29] 🧠 tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction（tttLRM：基于测试时训练的长上下文自回归三维重建）[10:23] 🗜 Nacrith: Neural Lossless Compression via Ensemble Context Modeling and High-Precision CDF Coding（Nacrith：基于集成上下文建模与高精度CDF编码的神经无损压缩）[11:08] 🧬 AAVGen: Precision Engineering of Adeno-associated Viral Capsids for Renal Selective Targeting（AAVGen：用于肾脏选择性靶向的腺相关病毒衣壳精准工程）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

2026.02.23 | VESPO防抖离线RL；推理模型学会“点到为止”

2026-02-2309:32

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 10 篇论文如下：[00:40] ⚖ VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training（VESPO：用于稳定离策略LLM训练的变分序列级软策略优化）[01:45] 💭 Does Your Reasoning Model Implicitly Know When to Stop Thinking?（你的推理模型是否隐含地知道何时停止思考？）[02:44] 🎮 Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control（生成现实：基于交互式视频生成与手部和相机控制的人本世界模拟）[03:24] 🤖 EgoPush: Learning End-to-End Egocentric Multi-Object Rearrangement for Mobile Robots（EgoPush：面向移动机器人的端到端自我中心多物体重排学习）[04:11] 🤖 SARAH: Spatially Aware Real-time Agentic Humans（SARAH：具备空间感知能力的实时拟人化智能体）[05:05] 🎬 VidEoMT: Your ViT is Secretly Also a Video Segmentation Model（VidEoMT：你的ViT模型暗中也是一个视频分割模型）[05:51] ✂ Sink-Aware Pruning for Diffusion Language Models（面向扩散语言模型的汇点感知剪枝）[06:36] 🎯 Selective Training for Large Vision Language Models via Visual Information Gain（基于视觉信息增益的大型视觉语言模型选择性训练）[07:18] 🧮 DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning（DeepVision-103K：一个视觉多样、覆盖广泛且可验证的多模态推理数学数据集）[08:16] 🤖 Learning Smooth Time-Varying Linear Policies with an Action Jacobian Penalty（通过动作雅可比惩罚学习平滑时变线性策略）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

【周末特辑】2月第4周最火AI论文 | 少即是够；FAC靶向补特征；噪声基准SQuTR

2026-02-2211:08

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 5 篇论文如下：[00:45] TOP1(🔥219) | 🧠 Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs（少即是够：在大型语言模型特征空间中合成多样化数据）[03:23] TOP2(🔥140) | 🔊 SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise（SQuTR：声学噪声下口语查询文本检索的鲁棒性基准）[05:03] TOP3(🔥71) | 🤖 GLM-5: from Vibe Coding to Agentic Engineering（GLM-5：从氛围编码到智能体工程）[06:53] TOP4(🔥61) | 🧠 Experiential Reinforcement Learning（经验性强化学习）[08:50] TOP5(🔥58) | 🏥 MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs（MedXIAOHE：构建医疗多模态大语言模型的完整方案）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

#box-pro-ellipsis-177362280244546{-webkit-line-clamp:2;}HuggingFace 每日AI论文速递