Discover
HuggingFace 每日AI论文速递

398 Episodes
Reverse
本期的 10 篇论文如下:[00:29] TOP1(🔥640) | 🤝 Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing(共享即关爱:基于集体RL经验共享的高效大模型后训练)[02:49] TOP2(🔥341) | 🔒 A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code(A.S.E:一个用于评估AI生成代码安全的仓库级基准)[04:59] TOP3(🔥218) | 🤖 VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model(VLA-Adapter:面向小型视觉-语言-动作模型的有效范式)[07:07] TOP4(🔥212) | 🤖 The Landscape of Agentic Reinforcement Learning for LLMs: A Survey(面向大语言模型的智能体强化学习全景:一项综述)[09:17] TOP5(🔥207) | 🤔 Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth(废话学:用深度解读无意义内容挑战大型语言模型)[11:19] TOP6(🔥183) | 🤔 Why Language Models Hallucinate(语言模型为何产生幻觉)[13:06] TOP7(🔥174) | 🧠 A Survey of Reinforcement Learning for Large Reasoning Models(大型推理模型的强化学习综述)[15:32] TOP8(🔥160) | 🎬 LongLive: Real-time Interactive Long Video Generation(LongLive:实时交互式长视频生成框架)[18:13] TOP9(🔥145) | 💡 Reverse-Engineered Reasoning for Open-Ended Generation(面向开放式生成的逆向工程推理)[20:27] TOP10(🔥140) | 🤖 A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers(科学大型语言模型综述:从数据基础到智能体前沿)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 15 篇论文如下:[00:20] 🎮 Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play(Vision-Zero:基于策略化博弈自对弈的可扩展视觉语言模型自我提升)[00:59] 🔥 MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use(MCPMark:面向真实且全面的MCP应用场景的压力测试基准)[01:36] 🐣 The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain(幼龙破壳: Transformer 与大脑模型之间缺失的环节)[02:10] 🤥 TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning(TruthRL:通过强化学习激励大模型说真话)[02:55] 🌊 OceanGym: A Benchmark Environment for Underwater Embodied Agents(OceanGym:面向水下具身智能体的综合基准环境)[03:41] ⚡ DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder(DC-VideoGen:基于深度压缩视频自编码器的高效视频生成)[04:14] 🔍 Who's Your Judge? On the Detectability of LLM-Generated Judgments(谁是你的评审?大模型生成评审意见的检测性研究)[04:59] ✂ Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning(赢得剪枝豪赌:统一样本-令牌剪枝的高效监督微调新方法)[05:45] 👁 Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training(未见先识:从语言预训练解密大模型视觉先验)[06:24] 🧠 Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training(思维火花!后训练阶段推理模型中涌现的专用注意力头)[07:09] 🧪 VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications(VitaBench:面向真实场景多功能交互任务的LLM智能体评测基准)[07:42] ⚡ dParallel: Learnable Parallel Decoding for dLLMs(dParallel:面向扩散大语言模型的可学习并行解码)[08:28] 🎯 IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance(IMG:通过隐式多模态引导校准扩散模型)[09:15] 🎬 MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation(MotionRAG:基于运动检索增强的图像到视频生成)[10:12] 🐬 Efficient Audio-Visual Speech Separation with Discrete Lip Semantics and Multi-Scale Global-Local Attention(基于离散唇部语义与多尺度全局-局部注意力的高效视听语音分离)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 15 篇论文如下:[00:22] ⚡ SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention(SLA:通过可微调稀疏线性注意力突破扩散Transformer的稀疏性极限)[01:05] 🗣 StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs(StableToken:一种面向韧性SpeechLLM的噪声鲁棒语义语音分词器)[01:54] 🎮 Multiplayer Nash Preference Optimization(多玩家纳什偏好优化)[02:57] 🔗 RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark(RealUnify:统一模型真的因“统一”而更强吗?综合基准揭晓答案)[03:44] 🎨 OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing(OpenGPT-4o-Image:面向高级图像生成与编辑的大规模综合数据集)[04:28] 🧠 Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR(超越探索-利用权衡:面向RLVR中LLM推理的隐状态方法)[05:05] 🧩 Visual Jigsaw Post-Training Improves MLLMs(视觉拼图后训练提升多模态大模型)[05:37] 🎬 SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer(SANA-Video:基于分块线性注意力Transformer的高效视频扩散生成模型)[06:15] 🔬 Democratizing AI scientists using ToolUniverse(用ToolUniverse普及AI科学家)[06:59] 🧠 When Does Reasoning Matter? A Controlled Study of Reasoning's Contribution to Model Performance(推理何时真正奏效?对推理贡献度的受控研究)[07:31] 📊 GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts(GSM8K-V:视觉语言模型能否解决视觉语境下的小学数学应用题?)[08:04] 🖼 EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling(EditScore:借助高保真奖励建模解锁图像编辑在线强化学习)[08:54] 🚀 SparseD: Sparse Attention for Diffusion Language Models(SparseD:面向扩散语言模型的稀疏注意力机制)[09:40] 🎛 EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering(EasySteer:高性能可扩展LLM推理控制统一框架)[10:32] 🧠 Towards Personalized Deep Research: Benchmarks and Evaluations(迈向个性化深度研究:基准与评估)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 15 篇论文如下:[00:20] 🎬 LongLive: Real-time Interactive Long Video Generation(LongLive:实时交互式长视频生成框架)[00:56] 🎯 Quantile Advantage Estimation for Entropy-Safe Reasoning(用于熵安全推理的分位数优势估计)[01:34] 📄 MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing(MinerU2.5:面向高效高分辨率文档解析的解耦视觉-语言模型)[02:11] 🧠 EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning(EPO:面向LLM智能体强化学习的熵正则策略优化)[03:08] 🧠 Variational Reasoning for Language Models(语言模型的变分推理框架)[03:37] 💬 Language Models Can Learn from Verbal Feedback Without Scalar Rewards(无需标量奖励,语言模型也能从语言反馈中学习)[04:32] 🔍 ReviewScore: Misinformed Peer Review Detection with Large Language Models(ReviewScore:用大模型揪出“跑偏”的同行评审)[05:12] 🎯 CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning(CapRL:用强化学习激发稠密图像描述潜能)[05:49] 🪄 MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning(MesaTask:面向任务驱动的桌面场景生成与3D空间推理)[06:32] 🎯 No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping(零方差提示不浪费:基于熵引导优势塑造的LLM强化学习新范式)[07:14] 🗣 VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing(VoiceAssistant-Eval:横跨听、说、看的AI助手基准测评)[07:58] 🧭 UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios(UltraHorizon:在长周期场景中评估智能体能力的基准)[08:29] 🖼 LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer(LucidFlux:无需文字描述的大规模扩散Transformer通用图像修复)[09:16] 🌐 WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning(WebGen-Agent:借助多级反馈与步骤级强化学习提升交互式网页生成)[09:49] 🔄 SPARK: Synergistic Policy And Reward Co-Evolving Framework(SPARK:策略与奖励协同演化的强化学习框架)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 5 篇论文如下:[00:38] TOP1(🔥116) | 📜 Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR(Baseer:面向阿拉伯文档OCR的视觉-语言模型)[02:43] TOP2(🔥113) | 🌐 Qwen3-Omni Technical Report(Qwen3-Omni技术报告:首个无性能损耗的全模态大模型)[05:23] TOP3(🔥112) | 🗺 RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation(RPG:用于统一可扩展代码库生成的仓库规划图)[07:45] TOP4(🔥104) | 📈 VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models(VCRL:面向大语言模型的方差驱动课程强化学习)[10:05] TOP5(🔥89) | 🚀 LIMI: Less is More for Agency(LIMI:少即是多,打造AI智能体)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 15 篇论文如下:[00:20] 🔬 SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines(SciReasoner:跨学科夯实科学推理基石)[01:00] 🧠 MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources(MMR1:基于方差感知采样与开放资源的多模态推理增强)[01:41] 📈 VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models(VCRL:面向大语言模型的方差驱动课程强化学习)[02:26] 🌳 Tree Search for LLM Agent Reinforcement Learning(基于树搜索的大语言模型智能体强化学习)[03:06] 🖼 Seedream 4.0: Toward Next-generation Multimodal Image Generation(Seedream 4.0:面向下一代多模态图像生成)[03:40] 🎯 Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets(Hunyuan3D-Omni:统一可控3D资产生成框架)[04:29] 🤖 AutoIntent: AutoML for Text Classification(AutoIntent:面向文本分类任务的自动化机器学习框架)[05:10] ⚖ TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them(TrustJudge:LLM-as-a-Judge的评分不一致性及缓解之道)[05:43] 🎢 CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning(CE-GPPO:通过梯度保留裁剪策略优化控制强化学习中的熵)[06:30] 🖼 Does FLUX Already Know How to Perform Physically Plausible Image Composition?(FLUX已掌握物理可信图像合成?)[07:31] ✂ CHARM: Control-point-based 3D Anime Hairstyle Auto-Regressive Modeling(CHARM:基于控制点的3D动漫发型自回归建模)[08:26] 🧠 Recon-Act: A Self-Evolving Multi-Agent Browser-Use System via Web Reconnaissance, Tool Generation, and Task Execution(Recon-Act:基于网络侦察、工具生成与任务执行的自我演化多智能体浏览器操作系统)[09:12] 🎮 V-GameGym: Visual Game Generation for Code Large Language Models(V-GameGym:面向代码大模型的视觉游戏生成基准)[09:49] 🗣 Interactive Recommendation Agent with Active User Commands(支持主动用户指令的交互式推荐智能体)[10:22] 🔍 BESPOKE: Benchmark for Search-Augmented Large Language Model Personalization via Diagnostic Feedback(BESPOKE:基于诊断反馈的搜索增强大模型个性化评测基准)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 10 篇论文如下:[00:22] 🎥 Video models are zero-shot learners and reasoners(视频模型是零样本学习者与推理者)[01:09] 🧠 SIM-CoT: Supervised Implicit Chain-of-Thought(SIM-CoT:基于监督式隐式思维链的高效推理)[01:55] 🪶 EmbeddingGemma: Powerful and Lightweight Text Representations(EmbeddingGemma:强大而轻量的文本表征模型)[02:29] 🗣 Advancing Speech Understanding in Speech-Aware Language Models with GRPO(基于GRPO提升语音感知大模型开放域理解能力)[03:06] 🌍 LLMs4All: A Review on Large Language Models for Research and Applications in Academic Disciplines(LLMs4All:面向各学科研究与应用的通用大模型综述)[03:52] 🎬 EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning(EditVerse:用上下文学习统一图像与视频编辑生成)[04:29] 🌀 Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation(Lavida-O:弹性大掩码扩散模型统一多模态理解与生成)[05:19] 🎬 PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation(PhysCtrl:基于生成式物理的可控且物理真实的视频生成框架)[05:58] 📄 Logics-Parsing Technical Report(Logics-Parsing 技术报告:基于强化学习的大模型端到端文档解析)[06:44] 🤖 On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub(关于自主编码的实证研究:GitHub上由AI代理发起的拉取请求分析)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 15 篇论文如下:[00:24] 📜 Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR(Baseer:面向阿拉伯文档OCR的视觉-语言模型)[00:58] 🚀 Reinforcement Learning on Pre-Training Data(基于预训练数据的强化学习)[01:37] 👁 Do You Need Proprioceptive States in Visuomotor Policies?(无需本体感觉状态的视觉-运动策略是否可行?)[02:36] 🚀 MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe(MiniCPM-V 4.5:通过架构、数据与训练配方烹饪高效多模态大模型)[03:24] 🎯 MAPO: Mixed Advantage Policy Optimization(混合优势策略优化:解决GRPO中优势分配难题)[04:06] 🚀 Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation(Hyper-Bagel:统一加速多模态理解与生成的一体化框架)[04:44] 🎯 VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction(VolSplat:基于体素对齐预测的前馈3D高斯抛雪球重建新范式)[05:31] 🌌 Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation(Lyra:基于视频扩散模型自蒸馏的生成式3D场景重建)[06:08] 🧩 What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT(有效推理的密码:重新审视思维链长度、回顾与结构)[06:41] 🗣 Large Language Models Discriminate Against Speakers of German Dialects(大型语言模型对德语方言使用者的歧视)[07:32] 📊 OpenGVL - Benchmarking Visual Temporal Progress for Data Curation(OpenGVL——面向数据整理的视觉时序进展评测基准)[08:19] 🪄 HyRF: Hybrid Radiance Fields for Memory-efficient and High-quality Novel View Synthesis(HyRF:混合辐射场实现内存高效且高质量的新视角合成)[09:07] 🛠 CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target for Better Flow Matching(条件感知重参数化对齐源域与目标域的流匹配)[09:41] 🛰 Zero-Shot Multi-Spectral Learning: Reimagining a Generalist Multimodal Gemini 2.5 Model for Remote Sensing Applications(零样本多光谱学习:让通用多模态Gemini 2.5模型在遥感任务中重焕新生)[10:28] 🌍 VIR-Bench: Evaluating Geospatial and Temporal Understanding of MLLMs via Travel Video Itinerary Reconstruction(VIR-Bench:通过旅行视频行程重建评测多模态大模型的地理-时空理解力)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 15 篇论文如下:[00:21] 🚀 LIMI: Less is More for Agency(LIMI:少即是多,打造AI智能体)[00:55] 🎬 OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models(无需掩膜的视频任意主体插入:基于扩散Transformer模型)[01:28] 🧩 OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System(OnePiece:面向工业级级联排序系统的上下文工程与推理融合框架)[02:19] 🌐 Qwen3-Omni Technical Report(Qwen3-Omni技术报告:首个无性能损耗的全模态大模型)[02:55] 🎬 TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs(TempSamp-R1:面向视频时序定位任务的高效离策略强化微调框架)[03:28] 📐 GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning(GeoPQA:弥合多模态大模型几何推理中的视觉感知鸿沟)[04:15] 🎯 DiffusionNFT: Online Diffusion Reinforcement with Forward Process(DiffusionNFT:基于前向过程在线扩散强化学习)[05:05] 🤖 ByteWrist: A Parallel Robotic Wrist Enabling Flexible and Anthropomorphic Motion for Confined Spaces(ByteWrist:面向狭窄空间的可穿戴并行机器人腕关节)[05:42] 💬 EpiCache: Episodic KV Cache Management for Long Conversational Question Answering(EpiCache:面向长对话问答的情景式KV缓存管理)[06:24] 🧠 SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?(SWE-Bench Pro:AI智能体能攻克长周期软件工程难题吗?)[07:01] 🧠 FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions(FlagEval发现报告:大推理模型在可自动验证文本与视觉问题上的初步测评)[08:05] 🎬 VideoFrom3D: 3D Scene Video Generation via Complementary Image and Video Diffusion Models(VideoFrom3D:基于互补图像与视频扩散模型的3D场景视频生成)[08:53] 🧪 ARE: Scaling Up Agent Environments and Evaluations(ARE:扩展智能体环境与评测规模)[09:28] 🧩 QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models(QWHA:面向大模型量化部署的沃尔什-哈达玛参数高效微调方法)[10:17] 🔍 Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels(从token与参数双视角解析监督微调对模型知识的影响)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 13 篇论文如下:[00:25] 🗺 RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation(RPG:用于统一可扩展代码库生成的仓库规划图)[01:00] 🌉 MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer(MANZANO:基于混合视觉词元器的简洁可扩展统一多模态模型)[01:42] 🧩 Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification(潜区分网络:生成建模、表示学习与分类的统一原理)[02:25] 🎯 BaseReward: A Strong Baseline for Multimodal Reward Model(BaseReward:多模态奖励模型的强力基线)[02:56] 🏠 SPATIALGEN: Layout-guided 3D Indoor Scene Generation(SpatialGen:布局引导的3D室内场景生成)[03:46] 🧠 BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent(BTL-UI:面向GUI智能体的“眨眼-思考-连接”脑启发推理模型)[04:30] 🎭 Lynx: Towards High-Fidelity Personalized Video Generation(Lynx:面向高保真个性化视频生成)[05:20] 🤖 A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning(用于机器人真实强化学习的视觉-语言-动作-评价模型)[05:54] 📹 RGB-Only Supervised Camera Parameter Optimization in Dynamic Scenes(动态场景下仅基于RGB视频监督的相机参数优化)[06:21] 🗣 Do You Hear What I Mean? Quantifying the Instruction-Perception Gap in Instruction-Guided Expressive Text-To-Speech Systems(你听见的是我想表达的吗?量化指令感知差距的表达型文本转语音系统研究)[07:07] 🎬 Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided Role-playing Agents(Video2Roleplay:面向视频引导角色扮演智能体的多模态数据集与框架)[07:50] 🗣 WhisTLE: Deeply Supervised, Text-Only Domain Adaptation for Pretrained Speech Recognition Transformers(WhisTLE:面向预训练语音识别Transformer的纯文本深度监督域适应方法)[08:30] 🗣 Ask-to-Clarify: Resolving Instruction Ambiguity through Multi-turn Dialogue(主动询问以澄清:通过多轮对话消解指令歧义)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 5 篇论文如下:[00:43] TOP1(🔥95) | 🌍 OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling(OmniWorld:面向4D世界建模的多领域多模态大规模数据集)[02:51] TOP2(🔥93) | 🔍 WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research(WebWeaver:面向开放型深度研究的动态提纲式网络证据结构化框架)[05:09] TOP3(🔥91) | 🤖 Scaling Agents via Continual Pre-training(基于持续预训练扩展智能体系统规模的研究)[07:33] TOP4(🔥88) | 🖥 ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data(ScaleCUA:基于跨平台数据的开源计算机智能体规模化方案)[10:48] TOP5(🔥79) | 🌊 FlowRL: Matching Reward Distributions for LLM Reasoning(FlowRL:通过流匹配奖励分布提升大语言模型推理能力)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 15 篇论文如下:[00:26] 🖥 ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data(ScaleCUA:基于跨平台数据的开源计算机智能体规模化方案)[01:01] 🌊 FlowRL: Matching Reward Distributions for LLM Reasoning(FlowRL:通过流匹配奖励分布提升大语言模型推理能力)[01:57] 🧭 Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration(跨越边界推理:借助测试时深思提升规范对齐)[02:55] 🧬 Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation(无需标签即可让语言模型自我进化:多数选择驱动,新颖性促进变异)[03:34] 🎨 Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation(先理解再生成:面向自回归图像生成的自引导训练)[04:12] 🔍 FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning(FinSearchComp:迈向真实专家级金融搜索与推理评测)[04:56] 🤖 RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation(RynnVLA-001:利用人类示范提升机器人操作能力)[05:39] 🔮 AToken: A Unified Tokenizer for Vision(AToken:面向视觉的统一Tokenizer)[06:10] 🌌 WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance(WorldForge:无需训练即可在视频扩散模型中解锁3D/4D生成的涌现能力)[06:58] 🖼 MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks(MultiEdit:面向多样复杂任务的指令式图像编辑新突破)[07:54] 🎮 RecoWorld: Building Simulated Environments for Agentic Recommender Systems(RecoWorld:为智能推荐系统打造仿真训练沙盒)[08:28] 🎯 Unleashing the Potential of Multimodal LLMs for Zero-Shot Spatio-Temporal Video Grounding(释放多模态大模型零样本时空视频定位潜能)[09:03] 🔍 Mind the Gap: A Closer Look at Tokenization for Multiple-Choice Question Answering with LLMs(留意空格:面向LLM选择题问答的Tokenization再审视)[09:51] 🩺 EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence(EchoVLM:面向通用超声智能的动态混合专家视觉-语言模型)[10:34] 🛰 FSG-Net: Frequency-Spatial Synergistic Gated Network for High-Resolution Remote Sensing Change Detection(FSG-Net:频-空协同门控网络用于高分辨率遥感变化检测)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 14 篇论文如下:[00:19] 🐪 Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale(Hala技术报告:规模化构建阿拉伯语为中心的指令与翻译模型)[00:56] 🚀 SAIL-VL2 Technical Report(SAIL-VL2技术报告)[01:42] 🌐 PANORAMA: The Rise of Omnidirectional Vision in the Embodied AI Era(全景视界:具身AI时代的360°视觉崛起)[02:33] 🎓 GenExam: A Multidisciplinary Text-to-Image Exam(GenExam:多学科文本到图像生成考试基准)[03:25] 🧹 Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning(擦除敏感记忆!用机器遗忘技术为代码大模型“去隐私”)[03:59] 🩺 MedResearcher-R1: Expert-Level Medical Deep Researcher via A Knowledge-Informed Trajectory Synthesis Framework(MedResearcher-R1:基于知识引导轨迹合成的专家级医学深度研究智能体)[04:37] 🔍 MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook(MARS2 2025多模态推理挑战赛:数据集、方法、结果、讨论与展望)[05:22] 🎭 Wan-Animate: Unified Character Animation and Replacement with Holistic Replication(Wan-Animate:统一角色动画与替换的完整复现框架)[05:59] 🧮 THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning(THOR:融合工具的分层强化学习优化数学推理)[06:40] 🔍 Improving Context Fidelity via Native Retrieval-Augmented Reasoning(提升上下文保真度的原生检索增强推理方法)[07:20] 🌍 AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions(AERIS:面向可靠且高技巧地球系统预测的阿尔贡地球系统模型)[08:13] 🎛 SteeringControl: Holistic Evaluation of Alignment Steering in LLMs(SteeringControl:对大模型对齐操控的全景评估)[08:48] ⚛ Quantum Variational Activation Functions Empower Kolmogorov-Arnold Networks(量子变分激活函数赋能Kolmogorov-Arnold网络)[09:37] 🚀 Hybrid Quantum-Classical Model for Image Classification(用于图像分类的混合量子-经典模型)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 11 篇论文如下:[00:27] 🔍 WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research(WebWeaver:面向开放型深度研究的动态提纲式网络证据结构化框架)[01:08] 🤖 Scaling Agents via Continual Pre-training(基于持续预训练扩展智能体系统规模的研究)[01:52] ⛵ WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning(WebSailor-V2:依托合成数据与可扩展强化学习跨越开源与私有代理鸿沟)[02:36] 🧠 Towards General Agentic Intelligence via Environment Scaling(迈向通用智能体的环境规模化之路)[03:09] 🔍 WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents(WebResearcher:在长程智能体中释放无界推理能力)[03:59] 🧠 ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization(ReSum:基于上下文压缩的无限视界搜索智能解锁)[04:39] 🚀 Single-stream Policy Optimization(单流策略优化:大语言模型强化学习的去组化革新)[05:19] 🎮 Hunyuan3D Studio: End-to-End AI Pipeline for Game-Ready 3D Asset Generation(Hunyuan3D工作室:面向游戏级3D资产生成的端到端AI管线)[06:00] 🧩 3D Aware Region Prompted Vision Language Model(具备3D感知能力的区域提示视觉语言模型)[06:36] 💡 EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving(EconProver:面向自动定理证明的更经济测试时扩展方法)[07:07] ⚛ Exact Coset Sampling for Quantum Lattice Algorithms(量子格点算法的精确陪集采样)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 14 篇论文如下:[00:24] 🌍 OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling(OmniWorld:面向4D世界建模的多领域多模态大规模数据集)[01:12] 🤖 UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning(UI-S1:基于半在线强化学习的图形界面自动化新进展)[01:51] 🏠 InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts(InternScenes:具备真实布局的大规模可模拟室内场景数据集)[02:27] 🖱 LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence(LazyDrag:通过显式对应关系在多模态扩散Transformer上实现稳定拖拽编辑)[02:58] 📊 Locality in Image Diffusion Models Emerges from Data Statistics(图像扩散模型中的局部性源于数据统计特性)[03:29] 🤔 Measuring Epistemic Humility in Multimodal Large Language Models(多模态大模型中的认知谦逊评估研究)[03:57] 🤖 Nav-R1: Reasoning and Navigation in Embodied Scenes(Nav-R1:具身场景中的推理与导航)[04:25] 🔍 Lost in Embeddings: Information Loss in Vision-Language Models(迷失在嵌入空间:视觉-语言模型中的信息损失)[04:54] 🌐 CognitiveSky: Scalable Sentiment and Narrative Analysis for Decentralized Social Media(CognitiveSky:面向去中心化社交媒体的情感与叙事可扩展分析框架)[05:19] 🔍 Look Again, Think Slowly: Enhancing Visual Reflection in Vision-Language Models(再审视,慢思考:增强视觉语言模型的视觉反思能力)[05:57] 🧠 EthicsMH: A Pilot Benchmark for Ethical Reasoning in Mental Health AI(心理健康AI伦理推理的试验基准:EthicsMH)[06:30] ⚖ Learning to Optimize Multi-Objective Alignment Through Dynamic Reward Weighting(通过动态奖励加权实现多目标对齐优化学习)[07:16] 🧠 PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits(PersonaX:基于大语言模型推断行为特质的多模态数据集)[07:52] 🔍 GAPrune: Gradient-Alignment Pruning for Domain-Aware Embeddings(GAPrune:面向领域感知嵌入的梯度对齐剪枝方法)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 14 篇论文如下:[00:25] 📚 IntrEx: A Dataset for Modeling Engagement in Educational Conversations(IntrEx:面向教育对话中参与度建模的数据集)[01:02] 📏 The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs(“收益递减的幻觉”:衡量大语言模型的长时程执行能力)[01:54] 🧩 X-Part: high fidelity and structure coherent shape decomposition(X-Part:高保真且结构一致的三维形状分解)[02:33] 🖼 InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis(InfGen:分辨率无关的可扩展图像合成新范式)[03:04] 🔍 HANRAG: Heuristic Accurate Noise-resistant Retrieval-Augmented Generation for Multi-hop Question Answering(HANRAG:面向多跳问答的启发式精准抗噪检索增强生成方法)[03:50] 🎙 VStyle: A Benchmark for Voice Style Adaptation with Spoken Instructions(VStyle:基于语音指令的语音风格自适应基准)[04:44] 🌸 FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Action Flow Policies(FLOWER:以高效视觉-语言-动作流策略普及通用机器人策略)[05:20] 🎨 Inpainting-Guided Policy Optimization for Diffusion Large Language Models(面向扩散大语言模型的基于文本补全引导的策略优化方法)[05:58] 🤖 Virtual Agent Economies(虚拟代理经济)[06:28] 📈 QuantAgent: Price-Driven Multi-Agent LLMs for High-Frequency Trading(QuantAgent:面向高频交易的价格驱动多智能体大语言模型框架)[07:02] 🧪 MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools(MCP-AgentBench:基于MCP中介工具的通用语言智能体真实性能评测)[07:41] 🎨 Color Me Correctly: Bridging Perceptual Color Spaces and Text Embeddings for Improved Diffusion Generation(精准上色:连接感知色彩空间与文本嵌入以提升扩散生成质量)[08:31] 🦎 LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning in Open-World Scenarios(LoFT:面向开放世界长尾场景的参数高效半监督微调方法)[09:13] 🗞 CMHG: A Dataset and Benchmark for Headline Generation of Minority Languages in China(CMHG:中国少数民族语言新闻标题生成数据集与评测基准)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 5 篇论文如下:[00:40] TOP1(🔥455) | 🤝 Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing(共享即关爱:基于集体RL经验共享的高效大模型后训练)[03:19] TOP2(🔥163) | 🤖 VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model(VLA-Adapter:面向小型视觉-语言-动作模型的有效范式)[05:44] TOP3(🔥156) | 🤔 Why Language Models Hallucinate(语言模型为何产生幻觉)[07:57] TOP4(🔥139) | 💡 Reverse-Engineered Reasoning for Open-Ended Generation(面向开放式生成的逆向工程推理)[10:35] TOP5(🔥131) | 🧠 A Survey of Reinforcement Learning for Large Reasoning Models(大型推理模型的强化学习综述)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 15 篇论文如下:[00:27] 🎭 HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning(HuMo:通过协同多模态条件控制实现以人为中心的视频生成)[01:18] 🤖 SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning(SimpleVLA-RL:通过强化学习实现VLA训练规模化)[02:02] 🗣 EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs(EchoX:基于回声训练弥合声学-语义鸿沟的语音大模型研究)[02:37] 🎭 Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis(Kling-Avatar:面向级联长时化身动画合成的多模态指令语义落地方法)[03:11] 🧭 Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents(驾驭不确定性:面向长周期LLM智能体的熵调制策略梯度方法)[03:57] 🎨 FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark(FLUX-Reason-6M和PRISM-Bench:百万级文生图推理数据集与全面评测基准)[04:34] 🤖 VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model(VLA-Adapter:面向小型视觉-语言-动作模型的有效范式)[05:14] 🔄 Can Understanding and Generation Truly Benefit Together -- or Just Coexist?(理解与生成真能互惠共进,抑或仅共存?)[05:46] 📹 SpatialVID: A Large-Scale Video Dataset with Spatial Annotations(SpatialVID大规模带空间标注的视频数据集)[06:16] 📊 Visual Programmability: A Guide for Code-as-Thought in Chart Understanding(视觉可编程性:面向图表理解的Code-as-Thought指南)[06:55] 🕵 Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval(梯度-注意力引导的双重掩码协同框架用于鲁棒的基于文本的人物检索)[07:35] 🖼 2D Gaussian Splatting with Semantic Alignment for Image Inpainting(面向图像修复的语义对齐2D高斯泼溅)[08:10] 📏 LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering(LoCoBench:面向复杂软件工程的长上下文大模型基准测试)[08:45] 🤖 OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning(OmniEVA:面向具身任务的自适应3D感知与本体约束联合规划器)[09:31] 🎯 The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward(散度选择:缓解可验证奖励强化学习多样性坍缩的关键)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 10 篇论文如下:[00:24] 🧠 A Survey of Reinforcement Learning for Large Reasoning Models(大型推理模型的强化学习综述)[00:45] 🔄 RewardDance: Reward Scaling in Visual Generation(RewardDance:视觉生成中的奖励缩放)[01:08] 🌐 3D and 4D World Modeling: A Survey(3D和4D世界建模:一项综述)[01:41] 🤖 AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning(AgentGym-RL: 通过多轮强化学习训练用于长视野决策制定的LLM智能体)[02:08] 🧩 P3-SAM: Native 3D Part Segmentation(P3-SAM:原生3D部分分割)[02:40] 🌐 Hunyuan-MT Technical Report(Hunyuan-MT技术报告)[03:08] ⚠ <think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs(从LLM生成有毒文本中吸取的经验教训)[03:44] 🤖 EnvX: Agentize Everything with Agentic AI(EnvX:使用代理式AI实现万物代理化)[04:13] 🤔 The Majority is not always right: RL training for solution aggregation(多数并不总是正确:用于解决方案聚合的强化学习训练)[04:33] 🤖 HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants(HumanAgencyBench:AI助手中人类代理支持的规模化评估)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 14 篇论文如下:[00:22] 🧠 Parallel-R1: Towards Parallel Thinking via Reinforcement Learning(Parallel-R1: 通过强化学习实现并行思维)[00:50] 🔍 Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search(Mini-o3:扩展视觉搜索中的推理模式与交互轮次)[01:15] 👁 Visual Representation Alignment for Multimodal Large Language Models(多模态大语言模型的视觉表征对齐)[01:54] 🔄 Reconstruction Alignment Improves Unified Multimodal Models(重建对齐改进统一多模态模型)[02:19] 🔄 UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward(UMO:通过匹配奖励扩展图像定制中的多身份一致性)[02:46] 🧠 Curia: A Multi-Modal Foundation Model for Radiology(Curia:一种用于放射学的多模态基础模型)[03:06] 🔮 F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions(F1:一种连接理解与生成到行动的视觉-语言-行动模型)[03:33] 🧠 Staying in the Sweet Spot: Responsive Reasoning Evolution via Capability-Adaptive Hint Scaffolding(保持在最佳状态:通过能力自适应提示脚手架实现响应式推理进化)[03:56] 🔄 Language Self-Play For Data-Free Training(语言自我博弈用于无数据训练)[04:22] 🔍 Causal Attention with Lookahead Keys(带前瞻键的因果注意力)[04:43] 🎨 Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference(直接将完整扩散轨迹与细粒度人类偏好对齐)[05:07] ✅ SimpleQA Verified: A Reliable Factuality Benchmark to Measure Parametric Knowledge(SimpleQA Verified:衡量参数化知识的可靠事实性基准)[05:30] 🚀 Q-Sched: Pushing the Boundaries of Few-Step Diffusion Models with Quantization-Aware Scheduling(Q-Sched:通过量化感知调度推动少步扩散模型的边界)[06:01] 📈 $ΔL$ Normalization: Rethink Loss Aggregation in RLVR($ΔL$ 归一化:重新思考RLVR中的损失聚合)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递