Discover
HuggingFace 每日AI论文速递
514 Episodes
Reverse
【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:32] 🩺 Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making(Baichuan-M3:建模临床问询以实现可靠的医疗决策)[01:17] 🧭 OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions(奥德赛竞技场:面向长视野、主动与归纳交互的大语言模型基准测试)[02:03] 📈 On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models(论大型语言模型强化微调中的熵动态)[02:47] 🎯 F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare(F-GRPO:别让你的策略学会常见而遗忘罕见)[03:48] ⚖ MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration(MSign:一种通过稳定秩恢复防止大语言模型训练不稳定的优化器)[04:33] 🤖 DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos(DreamDojo:基于大规模人类视频的通用机器人世界模型)[05:14] 🧠 Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training(通过翻译-推理集成训练实现自我改进的多语言长推理)[06:07] 🧮 Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math(评判我们无法解决的问题:一种基于后果的无监督研究级数学评估方法)[06:46] 🎯 POINTS-GUI-G: GUI-Grounding Journey(POINTS-GUI-G:图形用户界面基础任务之旅)[07:45] 🧠 MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments(MemGUI-Bench:动态环境中移动GUI代理内存能力的基准测试)[08:29] 🧠 Back to Basics: Revisiting Exploration in Reinforcement Learning for LLM Reasoning via Generative Probabilities(回归基础:通过生成概率重新审视强化学习在LLM推理中的探索)[09:18] 🎵 AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders(AudioSAE:利用稀疏自编码器理解音频处理模型)[09:59] ⚡ Canzona: A Unified, Asynchronous, and Load-Balanced Framework for Distributed Matrix-based Optimizers(Canzona:一个统一、异步且负载均衡的分布式矩阵优化器框架)[11:02] 🧠 InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning(InftyThink+:通过强化学习实现高效且有效的无限视野推理)[11:49] 🧠 PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks(PlanViz:面向计算机使用任务的规划导向图像生成与编辑评估)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 5 篇论文如下:[00:48] TOP1(🔥235) | 🤖 Green-VLA: Staged Vision-Language-Action Model for Generalist Robots(Green-VLA:面向通用机器人的分阶段视觉-语言-动作模型)[02:54] TOP2(🔥235) | 🧠 ERNIE 5.0 Technical Report(ERNIE 5.0 技术报告)[05:14] TOP3(🔥206) | 🤖 Kimi K2.5: Visual Agentic Intelligence(Kimi K2.5:视觉智能体)[07:49] TOP4(🔥147) | 🔍 Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models(Vision-DeepResearch:激励多模态大语言模型中的深度研究能力)[10:28] TOP5(🔥137) | 🍌 PaperBanana: Automating Academic Illustration for AI Scientists(PaperBanana:面向AI科学家的学术插图自动化生成框架)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:29] 📊 Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR(长度无偏序列策略优化:揭示与控制RLVR中的响应长度变化)[01:20] 🎬 Context Forcing: Consistent Autoregressive Video Generation with Long Context(上下文强制:具有长上下文的一致自回归视频生成)[02:11] 🧠 RISE-Video: Can Video Generators Decode Implicit World Rules?(RISE-Video:视频生成器能否解码隐含的世界规则?)[02:57] 🔮 ProAct: Agentic Lookahead in Interactive Environments(ProAct:交互式环境中的前瞻性智能体规划)[03:47] ⚡ Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations(Dr. Kernel:用于Triton内核生成的强化学习正确实现)[04:39] 🧭 Steering LLMs via Scalable Interactive Oversight(通过可扩展的交互式监督引导大型语言模型)[05:27] 🧠 Grounding and Enhancing Informativeness and Utility in Dataset Distillation(数据集约简中信息性与实用性的基础与增强)[06:13] 🧪 Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities(检索增强推理沙盒:一个解耦检索与推理能力的基准)[07:07] 🔍 Semantic Search over 9 Million Mathematical Theorems(对超过900万个数学定理的语义搜索)[07:57] 🕷 Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening(Spider-Sense:基于内在风险感知的高效智能体防御与分层自适应筛查)[08:39] 🧪 CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty(CAR-bench:评估现实世界不确定性下LLM智能体的一致性与极限感知能力)[09:30] 🤖 InterPrior: Scaling Generative Control for Physics-Based Human-Object Interactions(InterPrior:基于物理的人-物交互生成控制扩展框架)[10:22] 🎬 Thinking in Frames: How Visual Context and Test-Time Scaling Empower Video Reasoning(帧中思考:视觉上下文与测试时缩放如何赋能视频推理)[11:14] 🔄 SwimBird: Eliciting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs(SwimBird:在混合自回归多模态大语言模型中引发可切换推理模式)[12:20] 🔍 SAGE: Benchmarking and Improving Retrieval for Deep Research Agents(SAGE:深度研究智能体的检索基准评测与性能提升)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:29] 🧠 ERNIE 5.0 Technical Report(ERNIE 5.0 技术报告)[01:11] ⚡ FASA: Frequency-aware Sparse Attention(FASA:基于频率感知的稀疏注意力机制)[02:01] 📊 Training Data Efficiency in Multimodal Process Reward Models(多模态过程奖励模型中的训练数据效率研究)[02:44] 🤖 WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning(WideSeek-R1:通过多智能体强化学习探索宽度扩展以实现广泛信息检索)[03:28] ⚡ OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models(OmniSIFT:面向高效全模态大语言模型的模态非对称令牌压缩)[04:21] ⚡ HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing(HySparse:一种具有预言机令牌选择和KV缓存共享的混合稀疏注意力架构)[05:02] 🤖 EgoActor: Grounding Task Planning into Spatial-aware Egocentric Actions for Humanoid Robots via Visual-Language Models(EgoActor:通过视觉语言模型将任务规划落地为空间感知的具身动作)[06:05] 🎬 Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization(Quant VideoGen:通过2位KV缓存量化实现自回归长视频生成)[06:59] 🤖 SoMA: A Real-to-Sim Neural Simulator for Robotic Soft-body Manipulation(SoMA:面向机器人软体操作的真实到仿真神经模拟器)[07:44] 🔍 TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents(TIDE:基于轨迹的LLM智能体测试时改进诊断评估)[08:21] 🧠 Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers(语义路由:探索扩散变换器中多层LLM特征加权的融合框架)[09:12] 🤖 Rethinking the Trust Region in LLM Reinforcement Learning(重新思考大语言模型强化学习中的信任区域)[09:54] ♻ Residual Context Diffusion Language Models(残差上下文扩散语言模型)[10:40] 🧱 HY3D-Bench: Generation of 3D Assets(HY3D-Bench:3D资产的生成)[11:34] 🎨 AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations(AutoFigure:生成与优化可直接用于发表的科学插图)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:32] 👁 CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding(CodeOCR:视觉语言模型在代码理解中的有效性研究)[01:18] 🤖 AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration(AOrchestra:面向智能体编排的子智能体自动创建)[02:01] 🔍 No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs(思维链中无全局规划:揭示大语言模型的潜在规划视野)[02:43] 🔗 daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently(daVinci-Agency:高效解锁长程智能体工作流)[03:23] 🧠 Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks(世界模型研究并非仅将世界知识注入特定任务)[04:06] 🎬 3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation(面向视角自适应人体视频生成的3D感知隐式运动控制)[04:56] 🤖 MARS: Modular Agent with Reflective Search for Automated AI Research(MARS:具备反思搜索能力的模块化智能体用于自动化人工智能研究)[05:41] 📊 CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs(CoBA-RL:面向大语言模型强化学习的基于能力的预算分配算法)[06:25] ⚡ Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis(保持多样性的分布匹配蒸馏用于快速视觉合成)[07:19] 🤖 SWE-World: Building Software Engineering Agents in Docker-Free Environments(SWE-World:在无Docker环境中构建软件工程智能体)[08:09] 🤖 SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training(SWE-Master:通过后训练释放软件工程智能体的潜力)[09:14] 📊 Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation(基于人类偏好的查询特定评分规则学习用于深度研究报告生成)[10:08] ⚡ Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing(Parallel-Probe:通过二维探测实现高效并行思维)[10:59] 🎯 Unified Personalized Reward Model for Vision Generation(视觉生成的统一个性化奖励模型)[11:47] 🔍 WideSeek: Advancing Wide Research via Multi-Agent Scaling(WideSeek:通过多智能体扩展推进广度研究)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:32] 🤖 Green-VLA: Staged Vision-Language-Action Model for Generalist Robots(Green-VLA:面向通用机器人的分阶段视觉-语言-动作模型)[01:24] 🤖 Kimi K2.5: Visual Agentic Intelligence(Kimi K2.5:视觉智能体)[02:09] 🔍 Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models(Vision-DeepResearch:激励多模态大语言模型中的深度研究能力)[03:08] 🔍 Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models(Vision-DeepResearch 基准:重新思考多模态大语言模型的视觉与文本搜索)[03:57] 🔄 Closing the Loop: Universal Repository Representation with RPG-Encoder(闭环:基于RPG-Encoder的通用代码仓库表示方法)[04:39] 🧠 UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing(UniReason 1.0:面向世界知识对齐图像生成与编辑的统一推理框架)[05:23] 📊 WildGraphBench: Benchmarking GraphRAG with Wild-Source Corpora(WildGraphBench:基于野生来源语料库的图检索增强生成基准测试)[06:28] 📚 FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents(FS-Researcher:基于文件系统的智能体在长周期研究任务中的测试时扩展)[07:23] 🚀 SWE-Universe: Scale Real-World Verifiable Environments to Millions(SWE-Universe:将真实世界可验证的软件工程环境扩展至百万规模)[08:13] 📚 Wiki Live Challenge: Challenging Deep Research Agents with Expert-Level Wikipedia Articles(维基实时挑战:用专家级维基百科文章挑战深度研究智能体)[08:58] ⚖ SLIME: Stabilized Likelihood Implicit Margin Enforcement for Preference Optimization(SLIME:基于稳定似然的隐式边界强化偏好优化)[09:45] 🎨 PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss(PixelGen:基于感知损失的像素扩散模型超越潜在扩散模型)[10:38] ⚙ RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System(RLAnything:在完全动态强化学习系统中锻造环境、策略与奖励模型)[11:30] 🧠 Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation(思维画笔:将智能认知搜索与推理融入图像生成)[12:17] 🎬 PISCES: Annotation-free Text-to-Video Post-Training via Optimal Transport-Aligned Rewards(PISCES:基于最优传输对齐奖励的无标注文本到视频后训练方法)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:33] 🤖 ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas(ASTRA:基于自动化轨迹合成与强化学习竞技场的智能体训练框架)[01:22] 🛡 THINKSAFE: Self-Generated Safety Alignment for Reasoning Models(THINKSAFE:推理模型的自生成安全对齐)[02:18] 🧠 TTCS: Test-Time Curriculum Synthesis for Self-Evolving(TTCS:面向自进化的测试时课程合成)[03:09] 🍌 PaperBanana: Automating Academic Illustration for AI Scientists(PaperBanana:面向AI科学家的学术插图自动化生成框架)[03:51] 🔬 FourierSampler: Unlocking Non-Autoregressive Potential in Diffusion Language Models via Frequency-Guided Generation(傅里叶采样器:通过频率引导生成解锁扩散语言模型的非自回归潜力)[04:40] 🧠 ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought(ReGuLaR:基于渲染思维链指导的变分潜在推理)[05:22] 🎯 SSL: Sweet Spot Learning for Differentiated Guidance in Agentic Optimization(SSL:基于甜点学习的差异化引导智能体优化)[06:02] 🎯 DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment(DenseGRPO:从稀疏奖励到稠密奖励的流匹配模型对齐方法)[07:08] 🧠 Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification(突破自然推理的边界:形式逻辑验证的交织增益)[07:55] 📄 PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing(PaddleOCR-VL-1.5:面向鲁棒野外文档解析的多任务0.9B视觉语言模型)[08:45] 🎬 DreamActor-M2: Universal Character Image Animation via Spatiotemporal In-Context Learning(DreamActor-M2:通过时空上下文学习的通用角色图像动画)[09:42] 🧠 MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning(MemOCR:面向高效长程推理的布局感知视觉记忆)[10:24] 🦢 Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text(金鹅:一种从未经验证的互联网文本中合成无限RLVR任务的简单技巧)[11:13] 📊 Statistical Estimation of Adversarial Risk in Large Language Models under Best-of-N Sampling(大语言模型在最佳N采样下对抗性风险的统计估计)[12:00] ⚡ RM -RF: Reward Model for Run-Free Unit Test Evaluation(RM-RF:一种用于免运行单元测试评估的奖励模型)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 10 篇论文如下:[00:42] TOP1(🔥292) | 🧠 mHC: Manifold-Constrained Hyper-Connections(mHC:流形约束的超连接)[03:06] TOP2(🔥212) | 📈 GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization(GDPO:面向多奖励强化学习优化的组奖励解耦归一化策略优化)[04:45] TOP3(🔥209) | 🔍 Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning(观察、推理与搜索:面向智能体视频推理的开放网络视频深度研究基准)[06:59] TOP4(🔥193) | 👶 BabyVision: Visual Reasoning Beyond Language(BabyVision:超越语言的视觉推理)[08:57] TOP5(🔥190) | 🚀 STEP3-VL-10B Technical Report(STEP3-VL-10B 技术报告)[10:39] TOP6(🔥186) | 🤖 Agentic Reasoning for Large Language Models(大语言模型的智能体推理)[12:58] TOP7(🔥181) | 🧹 Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs(大语言模型能否清理你的数据?基于LLM的应用就绪数据准备综述)[15:19] TOP8(🔥171) | 🧠 LongCat-Flash-Thinking-2601 Technical Report(LongCat-Flash-Thinking-2601 技术报告)[17:22] TOP9(🔥165) | 🗺 Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization(借助地图思考:用于地理定位的强化并行地图增强智能体)[19:17] TOP10(🔥158) | 🧠 Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives(Idea2Story:将研究概念转化为完整科学叙事的自动化流程)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 5 篇论文如下:[00:39] TOP1(🔥181) | 🧹 Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs(大语言模型能否清理你的数据?基于LLM的应用就绪数据准备综述)[02:50] TOP2(🔥169) | 🧠 LongCat-Flash-Thinking-2601 Technical Report(LongCat-Flash-Thinking-2601 技术报告)[04:51] TOP3(🔥138) | 🧠 Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives(Idea2Story:将研究概念转化为完整科学叙事的自动化流程)[06:40] TOP4(🔥123) | 🤖 daVinci-Dev: Agent-native Mid-training for Software Engineering(daVinci-Dev:面向软件工程的智能体原生中期训练)[08:51] TOP5(🔥120) | 🛡 AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security(AgentDoG:面向AI智能体安全与安全的诊断性护栏框架)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:29] 🧭 Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models(万物归位:文本到图像模型空间智能基准测试)[01:21] 🧠 Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives(Idea2Story:将研究概念转化为完整科学叙事的自动化流程)[02:19] ⚡ Scaling Embeddings Outperforms Scaling Experts in Language Models(在语言模型中扩展嵌入层优于扩展专家混合)[02:58] 🔍 OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models(OCRVerse:迈向端到端视觉语言模型中的整体OCR)[03:39] 🤖 DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation(DynamicVLA:面向动态物体操作的视觉-语言-动作模型)[04:33] 🧠 MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods(MMFineReason:通过开放数据为中心的方法弥合多模态推理鸿沟)[05:20] 🔺 PLANING: A Loosely Coupled Triangle-Gaussian Framework for Streaming 3D Reconstruction(PLANING:一种用于流式三维重建的松散耦合三角-高斯框架)[06:08] 🧠 ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation(ConceptMoE:面向隐式计算分配的自适应令牌到概念压缩)[07:01] 🧩 AgentLongBench: A Controllable Long Benchmark For Long-Contexts Agents via Environment Rollouts(AgentLongBench:通过环境推演实现可控的长上下文智能体基准测试)[07:43] 🧠 Exploring Reasoning Reward Model for Agents(探索智能体推理奖励模型)[08:39] 🎤 Qwen3-ASR Technical Report(Qwen3-ASR技术报告)[09:27] 🚀 Language-based Trial and Error Falls Behind in the Era of Experience(经验时代下基于语言的试错方法已然落后)[10:16] 🌐 Typhoon-S: Minimal Open Post-Training for Sovereign Large Language Models(台风-S:主权大语言模型的最小化开放后训练方法)[11:02] ⚡ Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening(可扩展的幂采样:通过分布锐化解锁LLM高效、免训练推理)[11:59] 🧠 MAD: Modality-Adaptive Decoding for Mitigating Cross-Modal Hallucinations in Multimodal Large Language Models(MAD:模态自适应解码用于缓解多模态大语言模型中的跨模态幻觉)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 13 篇论文如下:[00:33] 🧠 Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation(越难越好:通过难度感知GRPO与多角度问题重构提升数学推理能力)[01:21] 🌍 Advancing Open-source World Models(推进开源世界模型)[01:55] 🧠 DeepSeek-OCR 2: Visual Causal Flow(DeepSeek-OCR 2:视觉因果流)[02:58] 🚀 Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning(Spark:通过关键状态动态分支实现战略策略感知探索的长视野智能体学习)[03:49] 🔬 Innovator-VL: A Multimodal Large Language Model for Scientific Discovery(创新者-VL:面向科学发现的多模态大语言模型)[04:34] 🔄 Linear representations in language models can change dramatically over a conversation(语言模型中的线性表征在对话过程中会发生剧烈变化)[05:26] 🚀 SERA: Soft-Verified Efficient Repository Agents(SERA:软验证高效代码库智能体)[06:01] 🤖 OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution(OmegaUse:构建用于自主任务执行的通用图形用户界面代理)[06:46] 🤖 GDCNet: Generative Discrepancy Comparison Network for Multimodal Sarcasm Detection(GDCNet:用于多模态讽刺检测的生成式差异比较网络)[07:37] 🗣 SE-DiCoW: Self-Enrolled Diarization-Conditioned Whisper(SE-DiCoW:自注册的说话人日志条件化Whisper模型)[08:27] 📊 RIR-Mega-Speech: A Reverberant Speech Corpus with Comprehensive Acoustic Metadata and Reproducible Evaluation(RIR-Mega-Speech:一个包含全面声学元数据且可复现评估的混响语音语料库)[09:16] ✏ SketchDynamics: Exploring Free-Form Sketches for Dynamic Intent Expression in Animation Generation(SketchDynamics:探索自由手绘草图在动画生成中的动态意图表达)[10:07] 🚀 UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders(UPLiFT:利用局部注意力机制实现高效像素密集特征上采样)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 14 篇论文如下:[00:30] 🛡 AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security(AgentDoG:面向AI智能体安全与安全的诊断性护栏框架)[01:21] 🧩 AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning(AdaReasoner:面向迭代式视觉推理的动态工具编排)[02:11] 🤖 A Pragmatic VLA Foundation Model(一个实用的VLA基础模型)[02:56] 🧠 Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models(视觉生成通过多模态世界模型解锁类人推理)[03:39] 🌍 World Craft: Agentic Framework to Create Visualizable Worlds via Text(World Craft:通过文本创建可视化世界的智能体框架)[04:26] 🧠 AVMeme Exam: A Multimodal Multilingual Multicultural Benchmark for LLMs' Contextual and Cultural Knowledge and Thinking(AVMeme 考试:针对大语言模型情境与文化知识与思维能力的多模态多语言多文化基准测试)[05:08] 🌲 FABLE: Forest-Based Adaptive Bi-Path LLM-Enhanced Retrieval for Multi-Document Reasoning(FABLE:基于森林的自适应双路径LLM增强检索用于多文档推理)[05:55] 🛡 TriPlay-RL: Tri-Role Self-Play Reinforcement Learning for LLM Safety Alignment(TriPlay-RL:面向大语言模型安全对齐的三角色自博弈强化学习)[06:44] 🎯 Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection(选择性导向:通过判别性层选择实现规范保持的控制)[07:17] ⚡ Revisiting Parameter Server in LLM Post-Training(重新审视大语言模型后训练中的参数服务器范式)[08:00] 🧠 Post-LayerNorm Is Back: Stable, ExpressivE, and Deep(后层归一化回归:稳定、高表达且深度的Transformer架构)[08:38] 🧬 GPCR-Filter: a deep learning framework for efficient and precise GPCR modulator discovery(GPCR-Filter:用于高效精准GPCR调节剂发现的深度学习框架)[09:39] ⚠ HalluCitation Matters: Revealing the Impact of Hallucinated References with 300 Hallucinated Papers in ACL Conferences(幻觉引用问题:基于ACL会议中300篇幻觉论文揭示其影响)[10:38] 📊 Benchmarks Saturate When The Model Gets Smarter Than The Judge(当模型比评估者更聪明时,基准测试趋于饱和)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:33] 🤖 daVinci-Dev: Agent-native Mid-training for Software Engineering(daVinci-Dev:面向软件工程的智能体原生中期训练)[01:21] 🧹 Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs(大语言模型能否清理你的数据?基于LLM的应用就绪数据准备综述)[02:21] 🎬 The Script is All You Need: An Agentic Framework for Long-Horizon Dialogue-to-Cinematic Video Generation(剧本即一切:面向长时域对话到电影视频生成的智能体框架)[03:08] 🔬 Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility(科学图像合成:基准测试、方法论与下游效用)[04:00] 🔬 iFSQ: Improving FSQ for Image Generation with 1 Line of Code(iFSQ:一行代码改进FSQ用于图像生成)[04:42] ⚡ Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers(弹性注意力:面向高效Transformer的测试时自适应稀疏率)[05:36] 🎬 Self-Refining Video Sampling(自优化视频采样)[06:31] 🧠 Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability(教模型自我教学:可学习性边缘的推理)[07:23] 🎤 VIBEVOICE-ASR Technical Report(VIBEVOICE-ASR技术报告)[08:06] 📊 CGPT: Cluster-Guided Partial Tables with LLM-Generated Supervision for Table Retrieval(CGPT:基于聚类引导的部分表格与LLM生成监督的表格检索方法)[09:04] 📊 STAR: Semantic Table Representation with Header-Aware Clustering and Adaptive Weighted Fusion(STAR:基于表头感知聚类与自适应加权融合的语义表格表示)[09:51] 🧠 Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents(减少泛化税:关于LLM智能体强化学习训练的跨领域泛化研究)[10:26] 🚀 AR-Omni: A Unified Autoregressive Model for Any-to-Any Generation(AR-Omni:一种用于任意到任意生成的统一自回归模型)[11:15] 🔍 SAGE: Steerable Agentic Data Generation for Deep Search with Execution Feedback(SAGE:基于执行反馈的可控智能体数据生成用于深度搜索)[12:04] 🤖 Agentic Very Long Video Understanding(基于智能体的超长视频理解)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:32] 🧠 LongCat-Flash-Thinking-2601 Technical Report(LongCat-Flash-Thinking-2601 技术报告)[01:13] ✂ SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents(SWE-Pruner:面向编码代理的自适应上下文剪枝框架)[02:08] 🧠 TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers(TwinBrainVLA:通过非对称混合Transformer释放通用视觉语言模型在具身任务中的潜力)[02:58] 🧠 VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents(VisGym:面向多模态智能体的多样化、可定制、可扩展环境)[03:58] 🧬 Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification(验证的推理时扩展:通过测试时准则引导验证实现自演化的深度研究智能体)[04:40] ⚡ Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow(Jet-RL:通过统一的训练与推理精度流实现基于策略的FP8强化学习)[05:32] ⚡ SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer(SALAD:通过高效线性注意力调优实现视频扩散Transformer的高稀疏性注意力)[06:11] 🧠 MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences(MeepleLM:模拟多样化主观体验的虚拟游戏测试员)[06:55] 🎬 Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory(Memory-V2V:利用记忆增强视频到视频扩散模型)[07:43] 🧠 Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation(知识不足够:注入强化学习技能以实现持续适应)[08:22] 🚀 Endless Terminals: Scaling RL Environments for Terminal Agents(无尽终端:为终端智能体扩展强化学习环境)[09:09] 🧪 DSGym: A Holistic Framework for Evaluating and Training Data Science Agents(DSGym:一个用于评估和训练数据科学智能体的整体框架)[10:11] 🧠 Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind(镣铐之舞:基于心智理论的学术反驳中的策略性说服)[10:58] 💻 Guidelines to Prompt Large Language Models for Code Generation: An Empirical Characterization(面向代码生成的大语言模型提示指南:一项实证性特征研究)[11:39] ⚖ Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain(Mecellem模型:针对法律领域从零开始训练与持续预训练的土耳其语模型)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 5 篇论文如下:[00:44] TOP1(🔥159) | 🤖 Agentic Reasoning for Large Language Models(大语言模型的智能体推理)[03:02] TOP2(🔥138) | ⚖ Your Group-Relative Advantage Is Biased(你的组相对优势存在偏差)[05:37] TOP3(🔥71) | 🤖 Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization(Being-H0.5:基于人类中心机器人学习的跨具身泛化扩展)[08:18] TOP4(🔥63) | 🚀 EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience(EvoCUA:通过从可扩展合成经验中学习来演化计算机使用智能体)[10:14] TOP5(🔥62) | ⚙ ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development(ABC-Bench:面向真实世界开发的智能体后端编码基准测试)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:32] 🤖 BayesianVLA: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries(BayesianVLA:通过潜在动作查询对视觉语言动作模型进行贝叶斯分解)[01:22] ⚠ The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models(灵活性陷阱:为何任意顺序生成会限制扩散语言模型的推理潜力)[02:26] 🎥 HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding(HERMES:将KV缓存作为分层内存以实现高效流式视频理解)[03:14] 🚀 EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience(EvoCUA:通过从可扩展合成经验中学习来演化计算机使用智能体)[04:02] 🧪 LLM-in-Sandbox Elicits General Agentic Intelligence(沙盒中的LLM激发通用智能体智能)[04:54] 🚀 Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model(Stable-DiffCoder:推进代码扩散大语言模型的前沿)[05:34] 🎭 SAMTok: Representing Any Mask with Two Words(SAMTok:用两个词表示任意掩码)[06:30] 🚀 Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders(使用表征自编码器扩展文本到图像扩散变换器)[07:23] 🔬 Learning to Discover at Test Time(在测试时学习发现)[08:08] 🔍 Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing(重新思考组合图像检索评估:一个源自图像编辑的细粒度基准)[09:06] ⚙ Towards Automated Kernel Generation in the Era of LLMs(大语言模型时代的自动化内核生成研究)[09:48] 🔄 OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation(OpenVision 3:一个用于理解和生成的统一视觉编码器家族)[10:45] 💻 Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces(终端基准测试:在命令行界面中对智能体进行困难、现实任务的基准评估)[11:29] 🗣 Qwen3-TTS Technical Report(Qwen3-TTS技术报告)[12:13] 🤖 Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning(Cosmos策略:通过微调视频模型实现视觉运动控制与规划)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:30] 🤖 Agentic Reasoning for Large Language Models(大语言模型的智能体推理)[01:05] 🤖 Rethinking Video Generation Model for the Embodied World(为具身世界重新思考视频生成模型)[01:43] 🤖 Paper2Rebuttal: A Multi-Agent Framework for Transparent Author Response Assistance(Paper2Rebuttal:一个用于透明作者回复辅助的多智能体框架)[02:34] 📊 MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents(MMDeepResearch-Bench:面向多模态深度研究智能体的基准测试)[03:24] 🧠 Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning(思维渲染:将文本链式思维渲染为图像以进行视觉潜在推理)[04:03] 📄 Typhoon OCR: Open Vision-Language Model For Thai Document Extraction(台风OCR:面向泰语文档提取的开放视觉语言模型)[04:51] 🛡 FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments(FinVault:面向执行环境基准测试的金融智能体安全性评估)[05:41] ⚡ Typhoon ASR Real-time: FastConformer-Transducer for Thai Automatic Speech Recognition(台风ASR实时系统:面向泰语自动语音识别的FastConformer-Transducer模型)[06:45] 🔍 XR: Cross-Modal Agents for Composed Image Retrieval(XR:用于组合图像检索的跨模态智能体)[07:29] 🔊 Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis(量化口音语音合成中说话人嵌入与音系规则的交互作用)[08:19] 🤖 Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics(Numina-Lean-Agent:一个开放通用的形式数学智能体推理系统)[09:15] 🤖 RoboBrain 2.5: Depth in Sight, Time in Mind(RoboBrain 2.5:洞见深度,心系时序)[10:16] 🔍 Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models(迷失于提示顺序:揭示语言模型中因果注意力的局限性)[10:59] 🧠 AgentEHR: Advancing Autonomous Clinical Decision-Making via Retrospective Summarization(AgentEHR:通过回顾性摘要推进自主临床决策)[11:43] 🕳 The Responsibility Vacuum: Organizational Failure in Scaled Agent Systems(责任真空:规模化智能体系统中的组织性失效)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:31] 🤖 Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey(基于大语言模型的软件工程问题解决:进展、前沿与全面综述)[01:15] 🔮 FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs(FutureOmni:评估多模态大语言模型基于全模态上下文进行未来预测的能力)[02:11] ⚡ Toward Efficient Agents: Memory, Tool learning, and Planning(迈向高效智能体:记忆、工具学习与规划)[02:51] 🤖 Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization(Being-H0.5:基于人类中心机器人学习的跨具身泛化扩展)[03:40] 🎬 OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer(OmniTransfer:时空视频迁移的一体化框架)[04:28] 🧠 $\texttt{MemoryRewardBench}$: Benchmarking Reward Models for Long-Term Memory Management in Large Language Models(《MemoryRewardBench:面向大语言模型长期记忆管理的奖励模型基准评测》)[05:15] 🧠 Think3D: Thinking with Space for Spatial Reasoning(Think3D:利用空间进行空间推理的思考)[06:06] 🫁 UniX: Unifying Autoregression and Diffusion for Chest X-Ray Understanding and Generation(UniX:统一自回归与扩散模型用于胸部X光片理解与生成)[07:08] ⚙ ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents(ToolPRMBench:评估和推进工具使用智能体的过程奖励模型)[07:58] 🧠 Aligning Agentic World Models via Knowledgeable Experience Learning(通过知识化经验学习对齐具身世界模型)[08:45] 🤖 Agentic-R: Learning to Retrieve for Agentic Search(Agentic-R:面向智能体搜索的检索学习)[09:25] 🔤 LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR(LightOnOCR:一个用于最先进OCR的10亿参数端到端多语言视觉语言模型)[10:14] 📊 PRiSM: Benchmarking Phone Realization in Speech Models(PRiSM:语音模型中音素实现的基准测试)[11:02] 🔍 On the Evidentiary Limits of Membership Inference for Copyright Auditing(论成员推理在版权审计中的证据性局限)[11:46] 🔒 Fundamental Limitations of Favorable Privacy-Utility Guarantees for DP-SGD(差分隐私随机梯度下降(DP-SGD)中有利隐私-效用保证的基本局限性)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 8 篇论文如下:[00:30] ⚙ ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development(ABC-Bench:面向真实世界开发的智能体后端编码基准测试)[01:15] 🧠 Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge(多路思考:基于词元级分支与合并的推理方法)[02:13] 🕺 CoDance: An Unbind-Rebind Paradigm for Robust Multi-Subject Animation(CoDance:一种用于鲁棒多主体动画的解绑-重绑范式)[03:01] 🧭 The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models(助手轴:定位与稳定语言模型的默认人格)[03:30] 🧠 Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs(虚假奖励悖论:从机制上理解RLVR如何激活LLM中的记忆捷径)[04:21] 🔬 SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved Literature(SIN-Bench:在长上下文多模态科学交织文献中追踪原生证据链)[05:08] 🧭 YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation(YaPO:用于领域适应的可学习稀疏激活导向向量)[05:56] 🧬 Medical SAM3: A Foundation Model for Universal Prompt-Driven Medical Image Segmentation(Medical SAM3:面向通用提示驱动医学图像分割的基础模型)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递
【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗 https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:33] ⚖ Your Group-Relative Advantage Is Biased(你的组相对优势存在偏差)[01:20] 🍎 The Poisoned Apple Effect: Strategic Manipulation of Mediated Markets via Technology Expansion of AI Agents(毒苹果效应:通过AI代理技术扩展对中介市场的战略性操纵)[02:08] 🛠 Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text(解锁隐性经验:从文本合成工具使用轨迹)[03:14] 📊 RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation(RubricHub:通过自动化粗到细生成构建的全面且高区分度的评分标准数据集)[04:20] 🤔 When Personalization Misleads: Understanding and Mitigating Hallucinations in Personalized LLMs(当个性化误导时:理解并缓解个性化大语言模型中的幻觉现象)[05:18] 🤖 ACoT-VLA: Action Chain-of-Thought for Vision-Language-Action Models(ACoT-VLA:面向视觉-语言-动作模型的动作思维链)[06:07] 🚧 BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search(BAPO:面向可靠智能搜索的边界感知策略优化)[07:04] 🎯 ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection(ProFit:通过概率引导的令牌选择在SFT中利用高价值信号)[08:01] 🤖 FrankenMotion: Part-level Human Motion Generation and Composition(FrankenMotion:部件级人体运动生成与组合)[08:54] 🧠 Reasoning Models Generate Societies of Thought(推理模型生成思想社会)[09:40] 🤖 PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records(PersonalAlign:基于长期用户中心化记录的个性化GUI代理的层次化隐式意图对齐)[10:27] 🔍 Building Production-Ready Probes For Gemini(构建适用于Gemini的生产级探针)[11:21] ⚙ PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models(PhysRVG:基于物理感知统一强化学习的视频生成模型)[12:31] 🧊 ShapeR: Robust Conditional 3D Shape Generation from Casual Captures(ShapeR:从随意拍摄中实现鲁棒的条件式3D形状生成)[13:24] 🚀 AstroReason-Bench: Evaluating Unified Agentic Planning across Heterogeneous Space Planning Problems(AstroReason-Bench:评估异构空间规划问题中的统一智能体规划能力)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递







