Discover
HuggingFace 每日AI论文速递
418 Episodes
Reverse
本期的 15 篇论文如下:[00:19] 🧠 Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning(每一种注意力都重要:面向长上下文推理的高效混合架构)[00:59] ⚖ BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping(BAPO:通过自适应裁剪的平衡策略优化稳定LLM离策略强化学习)[01:40] 🧠 LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts(LoongRL:面向长文本高级推理的强化学习方法)[02:18] 🌍 GigaBrain-0: A World Model-Powered Vision-Language-Action Model(GigaBrain-0:基于世界模型的通才视觉-语言-动作大模型)[02:49] 🔄 Language Models are Injective and Hence Invertible(语言模型是单射的,因此可逆)[03:25] 📹 VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos(VideoAgentTrek:利用无标注视频预训练计算机操作智能体)[04:01] 📲 DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile Phone Agents(DaMo:面向手机智能体的多模态大模型微调数据配比优化器)[04:55] 🚀 Unified Reinforcement and Imitation Learning for Vision-Language Models(统一强化与模仿学习的视觉-语言模型)[05:28] 🖼 Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing(Pico-Banana-400K:面向文本引导图像编辑的大规模高质量数据集)[06:17] 📊 FinSight: Towards Real-World Financial Deep Research(FinSight:迈向真实场景的金融深度研究)[07:06] 🧠 Are they lovers or friends? Evaluating LLMs' Social Reasoning in English and Korean Dialogues(他们是恋人还是朋友?评估大语言模型在英韩对话中的社会推理能力)[07:43] 🌍 OmniNWM: Omniscient Driving Navigation World Models(OmniNWM:全景驾驶导航全知世界模型)[08:28] 🕳 Attention Sinks in Diffusion Language Models(扩散语言模型中的注意力沉陷现象)[09:04] 📄 olmOCR 2: Unit Test Rewards for Document OCR(olmOCR 2:基于单元测试奖励的文档OCR系统)[09:42] 🧠 KORE: Enhancing Knowledge Injection for Large Multimodal Models via Knowledge-Oriented Augmentations and Constraints(KORE:通过知识导向增强与约束为大模型持续注入知识)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 14 篇论文如下:[00:19] 🧠 LightMem: Lightweight and Efficient Memory-Augmented Generation(LightMem:轻量高效的记忆增强生成框架)[00:55] 🌀 World-in-World: World Models in a Closed-Loop World(世界中的世界:闭环环境下的世界模型)[01:44] 🖼 UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation(UniGenBench++:面向文本到图像生成的统一语义评测基准)[02:29] 🧪 Chem-R: Learning to Reason as a Chemist(Chem-R:像化学家一样学习推理)[03:10] 🎬 MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation(MoGA:面向端到端长视频生成的分组混合注意力机制)[03:52] 🔍 Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs(任意区域皆可掌握:面向多模态大模型的精准上下文像素级理解)[04:49] 🎬 IF-VidCap: Can Video Caption Models Follow Instructions?(IF-VidCap:视频字幕模型能听懂指令吗?)[05:35] 🚀 Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model(万亿参数思维模型的强化学习扩展之路)[06:21] 🎬 MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues(MT-Video-Bench:面向多轮对话评估多模态大模型视频理解能力的综合基准)[07:12] 🧠 ssToken: Self-modulated and Semantic-aware Token Selection for LLM Fine-tuning(ssToken:面向大模型微调的自调制语义感知Token筛选方法)[07:43] 🎬 MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models(MUG-V 10B:面向大视频生成模型的高效训练流水线)[08:18] 🎯 ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder(ProCLIP:基于大语言模型嵌入器的渐进式视觉-语言对齐方法)[09:29] 🎬 UltraGen: High-Resolution Video Generation with Hierarchical Attention(UltraGen:基于分层注意力的原生高分辨率视频生成)[10:15] 🔄 DSI-Bench: A Benchmark for Dynamic Spatial Intelligence(DSI-Bench:动态空间智能评测基准)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 13 篇论文如下:[00:21] 🪞 PICABench: How Far Are We from Physically Realistic Image Editing?(PICABench:我们离物理真实的图像编辑还有多远?)[01:04] 🤖 DeepAnalyze: Agentic Large Language Models for Autonomous Data Science(DeepAnalyze:面向自主数据科学的智能体大模型)[01:50] 🗜 Glyph: Scaling Context Windows via Visual-Text Compression(Glyph:通过视觉-文本压缩扩展上下文窗口长度)[02:23] 🔍 Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation(面向通用检索增强生成的混合模态检索研究)[03:10] 🔗 When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling(何时集成:定位Token级位置实现稳定高效的大模型集成)[04:09] 🎯 Annotation-Efficient Universal Honesty Alignment(注释高效型通用诚实对齐)[04:49] 🖌 Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback(Uniworld-V2:借助扩散负感知微调与MLLM隐式反馈强化图像编辑)[05:46] 👁 RL makes MLLMs see better than SFT(强化学习让多模态大模型看得比监督微调更清楚)[06:33] 🚀 Visual Autoregressive Models Beat Diffusion Models on Inference Time Scaling(视觉自回归模型在推理时扩展上击败扩散模型)[07:09] 🎨 ConsistEdit: Highly Consistent and Precise Training-free Visual Editing(ConsistEdit:面向MM-DiT的高一致免训练视觉编辑)[07:56] 🔄 Deep Self-Evolving Reasoning(深度自演化推理)[08:22] 🧠 Beyond Pipelines: A Survey of the Paradigm Shift toward Model-Native Agentic AI(超越流水线:模型原生智能体AI范式转移综述)[09:07] 🔮 Chronos-2: From Univariate to Universal Forecasting(Chronos-2:从单变量到通用预测)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 15 篇论文如下:[00:20] 🧠 A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning(大模型推理中内部概率与自洽性桥接的理论研究)[01:04] 🌐 OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM(OmniVinci:面向全模态理解大模型的架构与数据增强)[01:44] 🎬 Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset(用百万级合成数据集放大指令式视频编辑)[02:28] ✂ NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks(NANO3D:无需训练与掩码的高效3D编辑新方法)[03:05] 🛰 Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery(Skyfall-GS:仅凭卫星影像合成沉浸式3D城市场景)[03:41] ⚠ Emergent Misalignment via In-Context Learning: Narrow in-context examples can produce broadly misaligned LLMs(情境学习中的突发错位:狭窄示例可让大模型广泛失准)[04:18] 🧬 Latent Diffusion Model without Variational Autoencoder(无需变分自编码器的潜在扩散模型)[04:52] 📸 LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal(LightsOut:基于扩散的延展补全提升镜头眩光去除)[05:30] 🧠 MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning(MorphoBench:随模型推理能力自适应难度的评测基准)[06:14] 🧠 A$^2$FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning(A²FM:面向工具感知混合推理的自适应智能体基础模型)[06:56] 🗣 Language Models Model Language(语言模型即语言本身)[07:36] 🖼 BLIP3o-NEXT: Next Frontier of Native Image Generation(BLIP3o-NEXT:原生图像生成的下一个前沿)[08:30] 🌐 Paper2Web: Let's Make Your Paper Alive!(Paper2Web:让你的论文“活”起来!)[09:12] 🔬 Foundation Models for Scientific Discovery: From Paradigm Enhancement to Paradigm Transition(面向科学发现的基础模型:从范式增强到范式跃迁)[09:55] 🔍 Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents(探索以进化:通过主动在线探索扩展深度研究智能体的聚合逻辑)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 5 篇论文如下:[00:40] TOP1(🔥154) | 🚀 QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs(QeRL:超越效率——面向大语言模型的量化增强强化学习)[02:19] TOP2(🔥138) | 🧠 Diffusion Transformers with Representation Autoencoders(基于表示自编码器的扩散Transformer)[04:54] TOP3(🔥134) | 🎯 Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model(空间强迫:面向视觉-语言-动作模型的隐式空间表征对齐)[07:55] TOP4(🔥125) | 🖥 D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI(D2E:利用桌面数据规模化视觉-动作预训练以迁移至具身智能)[10:30] TOP5(🔥110) | 📷 Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation(基于相机的统一多模态理解与生成模型)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 11 篇论文如下:[00:25] 👓 AI for Service: Proactive Assistance with AI Glasses(AI服务:AI眼镜的主动式协助)[01:06] 🎬 ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints(ImagerySearch:面向超越语义依赖约束的自适应测试时搜索视频生成)[01:43] 🎯 LaSeR: Reinforcement Learning with Last-Token Self-Rewarding(LaSeR:基于末词元自奖励的强化学习)[02:33] 🧩 TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar(TokDrift:当大模型用子词而代码用语法时)[03:35] 🧠 Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents(基于信息增益的策略优化:一种简单有效的多轮LLM智能体训练方法)[04:04] ⚡ Attention Is All You Need for KV Cache in Diffusion LLMs(扩散式大语言模型只需注意力即可搞定KV缓存)[04:45] 🤥 When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA(当模型撒谎时我们反而学到东西:用PsiloQA实现跨语言细粒度幻觉检测)[05:33] 📄 PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model(PaddleOCR-VL:以9亿参数超轻量多模态模型刷新多语言文档解析性能)[06:13] 🧠 VR-Thinker: Boosting Video Reward Models through Thinking-with-Image Reasoning(VR-Thinker:通过“边看边想”推理提升视频奖励模型)[06:52] 📐 MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning(MathCanvas:面向多模态数学推理的内生视觉思维链)[07:39] 🧠 COIG-Writer: A High-Quality Dataset for Chinese Creative Writing with Thought Processes(COIG-Writer:高质量中文创意写作数据集,附带思维过程)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 15 篇论文如下:[00:21] 🎧 UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE(UniMoE-Audio:基于动态容量MoE的统一语音与音乐生成模型)[00:57] 🔍 Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization(注意力照亮大模型推理:预规划-锚定节奏实现细粒度策略优化)[01:38] ⚡ FlashWorld: High-quality 3D Scene Generation within Seconds(FlashWorld:秒级高质量3D场景生成)[02:06] 🐝 Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs(Bee:高质量语料与全栈套件解锁完全开源多模态大模型)[02:37] 🗣 InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue(InteractiveOmni:面向音视频多轮对话的统一全模态模型)[03:24] 🌍 PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning(PhysMaster:通过强化学习掌握视频生成的物理表征)[04:00] 🧪 LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models(LIBERO-Plus:视觉-语言-动作模型鲁棒性深度剖析)[04:43] 🚗 CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving(CVD-STORM:面向自动驾驶的跨视角视频扩散时空重建模型)[05:21] 🔍 Generative Universal Verifier as Multimodal Meta-Reasoner(生成式通用验证器:多模态元推理的反思引擎)[06:07] ⚖ ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs(ParallelBench:探明扩散式大模型并行解码的取舍)[06:43] 🎞 Trace Anything: Representing Any Video in 4D via Trajectory Fields(任意视频4D轨迹场表示:一次前馈即可还原每像素连续时空路径)[07:27] 🌍 Reasoning in Space via Grounding in the World(基于世界锚定的空间推理)[07:54] 🧠 The Role of Computing Resources in Publishing Foundation Model Research(计算资源在基础模型研究发表中的角色)[08:28] ⚖ UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning(UniME-V2:用多模态大模型当裁判,打造通用多模态表征)[09:05] 🤖 InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy(InternVLA-M1:面向通用机器人策略的空间引导视觉-语言-动作框架)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 14 篇论文如下:[00:20] 🖼 Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training(通过自监督预训练推进端到端像素空间生成建模)[00:53] 📚 DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation(DITING:面向网络小说翻译评测的多智能体基准框架)[01:41] 🌐 Scaling Language-Centric Omnimodal Representation Learning(以语言为中心的跨模态表征扩展学习)[02:29] 🎯 Detect Anything via Next Point Prediction(通过下一点预测检测万物)[03:02] ⚡ FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution(FlashVSR:迈向实时扩散式流媒体视频超分辨率)[03:40] 🎯 Temporal Alignment Guidance: On-Manifold Sampling in Diffusion Models(时间对齐引导:扩散模型中的流形采样)[04:16] 🧠 Dr.LLM: Dynamic Layer Routing in LLMs(Dr.LLM:大模型中的动态层级路由)[05:03] 🎯 Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model(空间强迫:面向视觉-语言-动作模型的隐式空间表征对齐)[05:50] 🤖 ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning(ERA:借助具身先验学习与在线强化学习将视觉-语言模型转化为具身智能体)[06:35] 🤖 Robot Learning: A Tutorial(机器人学习教程:从强化学习到多任务通用模型)[07:27] 🔄 SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models(SRUM:面向统一多模态模型的细粒度自奖励机制)[08:01] 🧠 Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models(面向扩散大语言模型的边界引导策略优化:内存高效的强化学习)[09:06] 🖼 UniFusion: Vision-Language Model as Unified Encoder in Image Generation(UniFusion:将视觉-语言模型统一作为图像生成的编码器)[09:43] 🧠 Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks(记忆即行动:面向长程智能体任务的自主上下文策展)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 15 篇论文如下:[00:23] 🚀 QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs(QeRL:超越效率——面向大语言模型的量化增强强化学习)[01:22] 🧠 Diffusion Transformers with Representation Autoencoders(基于表示自编码器的扩散Transformer)[02:12] 🎬 OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs(OmniVideoBench:面向全向多模态大模型的音视频协同理解评测基准)[02:41] 🔄 Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States(潜变量精化解码:通过精化信念状态增强基于扩散的语言模型)[03:18] 🌊 RLFR: Extending Reinforcement Learning for LLMs with Flow Environment(RLFR:基于潜流环境扩展大模型强化学习)[04:11] 🔍 Spotlight on Token Perception for Multimodal Reinforcement Learning(多模态强化学习中token感知的光束聚焦)[04:50] 🎬 AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration(AVoCaDO:面向时序编排的音视频联合字幕生成器)[05:25] 🌐 DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training(DiT360:混合训练视角与全景数据的高保真全景图像生成)[05:56] 🧠 Demystifying Reinforcement Learning in Agentic Reasoning(揭开强化学习在智能体推理中的神秘面纱)[06:51] 🧮 Making Mathematical Reasoning Adaptive(让数学推理具备自适应性)[07:26] 🛡 Building a Foundational Guardrail for General Agentic Systems via Synthetic Data(面向通用智能体的基础护栏:基于合成数据的预执行安全框架)[08:05] 🧠 ACADREASON: Exploring the Limits of Reasoning Models with Academic Research Problems(ACADREASON:用学术研究问题探索推理模型的极限)[08:43] 🎨 InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models(InternSVG:用多模态大模型统一搞定SVG理解、编辑与生成)[09:23] 🧾 FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs(FinAuditing:面向LLM评估的财务分类多文档基准)[10:09] 🧠 GIR-Bench: Versatile Benchmark for Generating Images with Reasoning(GIR-Bench:面向推理图像生成的多功能基准)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 14 篇论文如下:[00:20] 🖥 D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI(D2E:利用桌面数据规模化视觉-动作预训练以迁移至具身智能)[01:13] 📷 Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation(基于相机的统一多模态理解与生成模型)[01:56] 🎨 TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling(TAG:抑制幻觉的扩散采样切向放大引导)[02:31] 🧠 Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs(多模态提示优化:为何不为多模态大模型释放全模态潜能)[03:05] 🚀 AutoPR: Let's Automate Your Academic Promotion!(AutoPR:让学术晋升一键自动化!)[03:39] 🧭 R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?(R-HORIZON:你的大推理模型在广度与深度上究竟能走多远?)[04:14] 🚀 Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels(Webscale-RL:把强化学习数据扩展到预训练体量的自动化流水线)[04:56] 🛰 SpaceVista: All-Scale Visual Spatial Reasoning from mm to km(SpaceVista:毫米到千米全尺度视觉空间推理)[05:37] 🎥 StreamingVLM: Real-Time Understanding for Infinite Video Streams(StreamingVLM:面向无限视频流的实时理解框架)[06:19] 🌐 KORMo: Korean Open Reasoning Model for Everyone(KORMo:人人可用的韩语开放推理模型)[06:42] ♻ Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting(别浪费错误:通过置信度加权利用负RL组)[07:25] 🧠 Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out of Distribution Generalization(从推理到学习的桥梁:以复杂度分布外泛化揭穿幻觉)[08:16] ⚡ DISCO: Diversifying Sample Condensation for Efficient Model Evaluation(DISCO:以模型分歧为导向的样本浓缩加速评测)[08:56] 🚗 Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction(面向开放词汇占用预测的各向异性采样渐进高斯Transformer)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 5 篇论文如下:[00:33] TOP1(🔥300) | 🧠 Less is More: Recursive Reasoning with Tiny Networks(小而精:用微型网络递归推理)[02:16] TOP2(🔥164) | 🌱 Agent Learning via Early Experience(基于早期经验的主体学习)[04:15] TOP3(🔥105) | 🧠 Apriel-1.5-15b-Thinker(Apriel-1.5-15B-Thinker:以小博大实现前沿多模态推理的15B开源模型)[06:17] TOP4(🔥97) | 🧠 MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization(MM-HELIX:以整体平台与自适应混合策略优化激发多模态长链反思推理)[08:45] TOP5(🔥88) | 🎬 Paper2Video: Automatic Video Generation from Scientific Papers(论文自动生成学术演讲视频)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 14 篇论文如下:[00:16] 🌱 Agent Learning via Early Experience(基于早期经验的主体学习)[00:50] 🧠 MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization(MM-HELIX:以整体平台与自适应混合策略优化激发多模态长链反思推理)[01:42] 🧪 From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning(从“是什么”到“为什么”:面向循证化学反应条件推理的多智能体系统)[02:19] 🎬 UniVideo: Unified Understanding, Generation, and Editing for Videos(UniVideo:统一理解、生成与编辑视频的多模态框架)[03:01] 🧠 When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs(当思想邂逅事实:面向长上下文语言模型的可复用推理)[03:43] 🧠 Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning(元认知增强推理模型:自对齐强化学习)[04:25] 🧠 MemMamba: Rethinking Memory Patterns in State Space Model(MemMamba:重新思考状态空间模型中的记忆模式)[05:17] 🛡 The Alignment Waltz: Jointly Training Agents to Collaborate for Safety(对齐圆舞曲:联合训练智能体协同守护安全)[05:53] 🎯 Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense(混合强化:奖励稀疏时,密集信号更胜一筹)[06:40] 🧪 NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents(NewtonBench:评测大模型智能体在通用科学定律发现中的基准)[07:17] 🪚 DeepPrune: Parallel Scaling without Inter-trace Redundancy(DeepPrune:并行扩展中消除跨路径冗余的高效推理框架)[07:54] 🚀 Training-Free Group Relative Policy Optimization(免训练群组相对策略优化)[08:24] 🪄 ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation(ARTDECO:面向高效高保真即时三维重建的结构化场景表征)[08:55] 🤥 LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions(大模型在欺骗性样本与偏见人机交互中意外学会欺骗:不诚实行为的新兴错位)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 15 篇论文如下:[00:21] 🔄 Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer(Ming-UniVision:用统一连续视觉词表打通图像理解与生成)[00:59] 🧠 Cache-to-Cache: Direct Semantic Communication Between Large Language Models(缓存到缓存:大模型间的直接语义通信)[01:32] 🌀 Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding(Lumina-DiMOO:面向多模态生成与理解的离散扩散大模型)[02:07] 🧠 SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models(SHANKS:口语模型边听边想的同步推理框架)[03:06] 🤖 RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training(RLinf-VLA:面向VLA模型强化学习训练的统一高效框架)[04:02] 🎬 MATRIX: Mask Track Alignment for Interaction-aware Video Generation(MATRIX:面向交互感知视频生成的掩码轨迹对齐)[04:51] 🎯 Vibe Checker: Aligning Code Evaluation with Human Preference(Vibe Checker:让代码评估对齐人类偏好)[05:44] 🤖 Multi-Agent Tool-Integrated Policy Optimization(多智能体工具集成策略优化)[06:24] 🧠 CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling(风暴前夜:解锁优化建模原生推理潜能的轻量化矫正框架)[06:59] ✂ OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot(OBS-Diff:一次性精准剪枝扩散模型)[07:52] 🧠 Artificial Hippocampus Networks for Efficient Long-Context Modeling(面向高效长上下文建模的人工海马网络)[08:30] 🔍 Revisiting Long-context Modeling from Context Denoising Perspective(基于上下文降噪视角的长文本建模再审视)[09:11] 🧠 Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought(推动多语言推理模型:语言混合思维链新范式)[09:51] 💥 Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention(低精度Transformer训练为何失败:Flash Attention失效机理剖析)[10:37] ⚡ Native Hybrid Attention for Efficient Sequence Modeling(原生混合注意力高效序列建模)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 15 篇论文如下:[00:24] 📊 TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning(TaTToo:面向表格推理测试时扩展的“工具落地思维”过程奖励模型)[00:57] 🔍 Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs(Fathom-DeepResearch:解锁小模型长程信息检索与综合的钥匙)[01:39] 🚀 Fast-dLLM v2: Efficient Block-Diffusion LLM(Fast-dLLM v2:高效的块扩散大语言模型)[02:30] 🧑 CoDA: Coding LM via Diffusion Adaptation(CoDA:基于扩散适配的轻量级代码生成模型)[03:01] 🧩 Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning(规模化代码辅助思维链与指令以增强模型推理)[03:52] ⚖ ASPO: Asymmetric Importance Sampling Policy Optimization(ASPO:非对称重要性采样策略优化)[04:34] 🔗 Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context(混合机制:语言模型如何在上下文中检索绑定实体)[05:15] 🧠 AInstein: Assessing the Feasibility of AI-Generated Approaches to Research Problems(AInstein:评估AI生成科研方案可行性的研究框架)[05:51] 🪂 Refusal Falls off a Cliff: How Safety Alignment Fails in Reasoning?(拒绝断崖:安全对齐在推理中为何崩塌)[06:35] 🌍 HoloScene: Simulation-Ready Interactive 3D Worlds from a Single Video(HoloScene:单视频生成可交互3D仿真世界)[07:22] ⚡ TensorBLEU: Vectorized GPU-based BLEU Score Implementation for Per-Sentence In-Training Evaluation(TensorBLEU:面向逐句训练评估的向量化GPU加速BLEU分数实现)[08:09] 🎯 Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization(边缘自适应DPO:利用奖励模型实现偏好优化的粒度控制)[09:00] 🩺 Discrete Diffusion Models with MLLMs for Unified Medical Multimodal Generation(基于多模态大语言模型的离散扩散模型实现统一医学多模态生成)[09:46] 🧠 MixReasoning: Switching Modes to Think(混合推理:动态切换思考模式)[10:20] ⚡ LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation(LightCache:面向视频生成的内存高效、无需训练的加速方法)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 15 篇论文如下:[00:21] 🎬 Paper2Video: Automatic Video Generation from Scientific Papers(论文自动生成学术演讲视频)[00:55] 🎬 Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models(Video-LMM后训练:深入剖析大型多模态模型的视频推理)[01:38] 🎬 VChain: Chain-of-Visual-Thought for Reasoning in Video Generation(VChain:面向视频生成推理的视觉思维链)[02:14] 👻 Imperceptible Jailbreaking against Large Language Models(针对大语言模型的隐形越狱攻击)[02:56] 🌳 MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual Information(MITS:基于点互信息的树搜索增强大模型推理)[03:30] 🧬 Hybrid Architectures for Language Models: Systematic Analysis and Design Insights(语言模型混合架构:系统剖析与设计洞见)[04:07] 📊 Factuality Matters: When Image Generation and Editing Meet Structured Visuals(事实至关重要:当图像生成与编辑遇上结构化视觉)[04:59] 🔄 Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models(反应式Transformer:事件驱动的实时有状态对话模型)[05:55] ⚖ Judging with Confidence: Calibrating Autoraters to Preference Distributions(置信评判:将自动评分器校准到偏好分布)[06:44] 🎯 Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training(Reinforce-Ada:面向Reinforce风格LLM训练的自适应采样框架)[07:27] 📏 Optimal Scaling Needs Optimal Norm(最优扩放需要最优范数)[07:51] 🔬 Code4MeV2: a Research-oriented Code-completion Platform(Code4MeV2:面向研究的代码补全平台)[08:31] 🪞 Self-Reflective Generation at Test Time(测试时自反思生成)[09:15] 🔄 SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs(SwiReasoning:在显式与潜空间之间切换思维,实现帕累托更优的推理大模型)[10:00] 👀 Watch and Learn: Learning to Use Computers from Online Videos(观看与学习:从在线视频中学习使用计算机)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 15 篇论文如下:[00:28] 🧠 Apriel-1.5-15b-Thinker(Apriel-1.5-15B-Thinker:以小博大实现前沿多模态推理的15B开源模型)[01:04] 🚀 Efficient Multi-modal Large Language Models via Progressive Consistency Distillation(基于渐进一致性蒸馏的高效多模态大模型)[01:42] 🧩 Compose Your Policies! Improving Diffusion-based or Flow-based Robot Policies via Test-time Distribution-level Composition(组合式策略!利用测试时段分布级组合提升基于扩散或流的机器人策略性能)[02:19] 🪞 Self-Improvement in Multimodal Large Language Models: A Survey(多模态大语言模型自我提升综述)[02:59] 🧬 Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents(你的智能体可能误入歧途:自演化大模型智能体中的涌现风险)[03:38] 📊 CoDA: Agentic Systems for Collaborative Data Visualization(CoDA:面向协同数据可视化的智能体系统)[04:21] 🧐 SurveyBench: How Well Can LLM(-Agents) Write Academic Surveys?(SurveyBench:大模型(智能体)写学术综述能有多靠谱?)[05:06] 🔧 REPAIR: Robust Editing via Progressive Adaptive Intervention and Reintegration(REPAIR:渐进式自适应干预与再融合的鲁棒编辑框架)[05:53] 🔍 OrtSAE: Orthogonal Sparse Autoencoders Uncover Atomic Features(OrtSAE:正交稀疏自编码器揭示原子级特征)[06:38] 🔍 FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents(FocusAgent:轻量级检索器为网页智能体精简冗长上下文的简易高效方案)[07:14] 🎯 Improving GUI Grounding with Explicit Position-to-Coordinate Mapping(基于显式位置-坐标映射的GUI定位改进方法)[08:05] 📏 LSPO: Length-aware Dynamic Sampling for Policy Optimization in LLM Reasoning(LSPO:面向大模型推理的基于长度感知的动态采样策略优化)[08:45] 🤖 WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents(WAInjectBench:面向网页智能体的提示注入攻防基准评测)[09:19] 🍱 Free Lunch Alignment of Text-to-Image Diffusion Models without Preference Image Pairs(无需配对偏好图像即可免费对齐文本到图像扩散模型)[09:54] 🎯 LEAML: Label-Efficient Adaptation to Out-of-Distribution Visual Tasks for Multimodal Large Language Models(LEAML:面向多模态大模型的标签高效分布外视觉任务适配)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 5 篇论文如下:[00:43] TOP1(🔥323) | 🐣 The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain(幼龙破壳: Transformer 与大脑模型之间缺失的环节)[02:38] TOP2(🔥167) | 🎬 LongLive: Real-time Interactive Long Video Generation(LongLive:实时交互式长视频生成框架)[05:04] TOP3(🔥150) | 🔥 MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use(MCPMark:面向真实且全面的MCP应用场景的压力测试基准)[07:24] TOP4(🔥124) | 🧠 EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning(EPO:面向LLM智能体强化学习的熵正则策略优化)[09:18] TOP5(🔥122) | 🎮 Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play(Vision-Zero:基于策略化博弈自对弈的可扩展视觉语言模型自我提升)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 15 篇论文如下:[00:22] 🗜 LongCodeZip: Compress Long Context for Code Language Models(LongCodeZip:面向代码大模型的长上下文压缩方法)[00:56] 🎬 Self-Forcing++: Towards Minute-Scale High-Quality Video Generation(自增强++:迈向分钟级高质量视频生成)[01:38] 🧠 ExGRPO: Learning to Reason from Experience(基于经验的群体相对策略优化:让大模型学会从经验中推理)[02:32] 🥷 StealthAttack: Robust 3D Gaussian Splatting Poisoning via Density-Guided Illusions(隐身投毒:基于密度引导幻觉的鲁棒3D高斯溅射攻击)[03:32] 🎛 Interactive Training: Feedback-Driven Neural Network Optimization(交互式训练:反馈驱动的神经网络优化)[04:24] 📈 StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets?(StockBench:大模型智能体能否在真实股市中稳定盈利?)[05:07] 🔍 VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning(VOGUE:用视觉不确定性引导探索,提升多模态推理)[05:44] 🪓 The Rogue Scalpel: Activation Steering Compromises LLM Safety(失控的手术刀:激活向量操控竟瓦解大模型安全锁)[06:21] 🔍 CLUE: Non-parametric Verification from Experience via Hidden-State Clustering(CLUE:基于隐状态聚类的非参数经验验证)[07:09] 🔍 ModernVBERT: Towards Smaller Visual Document Retrievers(ModernVBERT:打造更轻量的视觉文档检索器)[07:54] 🗺 RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning(RewardMap:通过多阶段强化学习解决细粒度视觉推理中的稀疏奖励问题)[08:37] 🚀 F2LLM Technical Report: Matching SOTA Embedding Performance with 6 Million Open-Source Data(F2LLM技术报告:仅用600万开源数据即可达到SOTA嵌入性能)[09:13] 🧠 RLP: Reinforcement as a Pretraining Objective(RLP:将强化学习作为预训练目标)[09:45] 🖱 DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing(DragFlow:借助区域监督释放DiT先验,实现拖拽式编辑)[10:19] 🚀 The Unreasonable Effectiveness of Scaling Agents for Computer Use(扩展计算机使用代理的规模带来的不合理有效性)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 15 篇论文如下:[00:19] 🧠 DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search(DeepSearch:以蒙特卡洛树搜索破解强化学习可验证奖励瓶颈)[01:20] 🤖 GEM: A Gym for Agentic LLMs(GEM:面向智能体大模型的开放训练场)[01:57] 🧠 VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators(VLA-RFT:基于世界模拟器与验证奖励的视觉-语言-动作强化微调)[02:36] 🎒 Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation(背包强化学习:通过优化预算分配解锁大模型探索潜能)[03:06] 🎬 Code2Video: A Code-centric Paradigm for Educational Video Generation(Code2Video:面向教育视频生成的代码中心范式)[03:41] ⚙ PIPer: On-Device Environment Setup via Online Reinforcement Learning(PIPer:基于在线强化学习的设备端环境自动配置)[04:11] 🗜 ACON: Optimizing Context Compression for Long-horizon LLM Agents(ACON:面向长程LLM智能体的上下文压缩优化)[04:52] 🔍 Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls(为何Transformer学不会乘法?逆向工程揭示长程依赖陷阱)[05:22] ⚖ BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses(BiasFreeBench:面向大语言模型去偏响应评测的统一基准)[06:01] ⚡ Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution(Flash-Searcher:基于DAG并行执行的极速高效网络智能体)[06:42] 🚀 BroRL: Scaling Reinforcement Learning via Broadened Exploration(BroRL:通过拓宽探索规模来扩展强化学习)[07:25] 📊 Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum(超越对数似然:面向模型能力连续谱的监督微调概率目标)[08:02] 🎯 On Predictability of Reinforcement Learning Dynamics for Large Language Models(论大型语言模型强化学习动力学的可预测性)[08:31] 🖥 GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness(GUI-KV:面向具备时空感知的高效GUI智能体的KV缓存方案)[09:17] 🧠 Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned(训练视觉-语言过程奖励模型以实现多模态推理测试时扩展:关键洞见与经验总结)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
本期的 10 篇论文如下:[00:29] TOP1(🔥640) | 🤝 Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing(共享即关爱:基于集体RL经验共享的高效大模型后训练)[02:49] TOP2(🔥341) | 🔒 A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code(A.S.E:一个用于评估AI生成代码安全的仓库级基准)[04:59] TOP3(🔥218) | 🤖 VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model(VLA-Adapter:面向小型视觉-语言-动作模型的有效范式)[07:07] TOP4(🔥212) | 🤖 The Landscape of Agentic Reinforcement Learning for LLMs: A Survey(面向大语言模型的智能体强化学习全景:一项综述)[09:17] TOP5(🔥207) | 🤔 Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth(废话学:用深度解读无意义内容挑战大型语言模型)[11:19] TOP6(🔥183) | 🤔 Why Language Models Hallucinate(语言模型为何产生幻觉)[13:06] TOP7(🔥174) | 🧠 A Survey of Reinforcement Learning for Large Reasoning Models(大型推理模型的强化学习综述)[15:32] TOP8(🔥160) | 🎬 LongLive: Real-time Interactive Long Video Generation(LongLive:实时交互式长视频生成框架)[18:13] TOP9(🔥145) | 💡 Reverse-Engineered Reasoning for Open-Ended Generation(面向开放式生成的逆向工程推理)[20:27] TOP10(🔥140) | 🤖 A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers(科学大型语言模型综述:从数据基础到智能体前沿)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递







