2025.10.01 | 自对弈零标注训练;MCP代理深度评测
Description
本期的 15 篇论文如下:
[00:20 ] 🎮 Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play(Vision-Zero:基于策略化博弈自对弈的可扩展视觉语言模型自我提升)
[00:59 ] 🔥 MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use(MCPMark:面向真实且全面的MCP应用场景的压力测试基准)
[01:36 ] 🐣 The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain(幼龙破壳: Transformer 与大脑模型之间缺失的环节)
[02:10 ] 🤥 TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning(TruthRL:通过强化学习激励大模型说真话)
[02:55 ] 🌊 OceanGym: A Benchmark Environment for Underwater Embodied Agents(OceanGym:面向水下具身智能体的综合基准环境)
[03:41 ] ⚡ DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder(DC-VideoGen:基于深度压缩视频自编码器的高效视频生成)
[04:14 ] 🔍 Who's Your Judge? On the Detectability of LLM-Generated Judgments(谁是你的评审?大模型生成评审意见的检测性研究)
[04:59 ] ✂ Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning(赢得剪枝豪赌:统一样本-令牌剪枝的高效监督微调新方法)
[05:45 ] 👁 Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training(未见先识:从语言预训练解密大模型视觉先验)
[06:24 ] 🧠 Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training(思维火花!后训练阶段推理模型中涌现的专用注意力头)
[07:09 ] 🧪 VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications(VitaBench:面向真实场景多功能交互任务的LLM智能体评测基准)
[07:42 ] ⚡ dParallel: Learnable Parallel Decoding for dLLMs(dParallel:面向扩散大语言模型的可学习并行解码)
[08:28 ] 🎯 IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance(IMG:通过隐式多模态引导校准扩散模型)
[09:15 ] 🎬 MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation(MotionRAG:基于运动检索增强的图像到视频生成)
[10:12 ] 🐬 Efficient Audio-Visual Speech Separation with Discrete Lip Semantics and Multi-Scale Global-Local Attention(基于离散唇部语义与多尺度全局-局部注意力的高效视听语音分离)
<figure> </figure>
</figure>【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递










