2025.10.14 | 量化误差变奖励,单卡训32B;面向多模态大模型的音视频评测基准
Description
本期的 15 篇论文如下:
[00:23 ] 🚀 QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs(QeRL:超越效率——面向大语言模型的量化增强强化学习)
[01:22 ] 🧠 Diffusion Transformers with Representation Autoencoders(基于表示自编码器的扩散Transformer)
[02:12 ] 🎬 OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs(OmniVideoBench:面向全向多模态大模型的音视频协同理解评测基准)
[02:41 ] 🔄 Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States(潜变量精化解码:通过精化信念状态增强基于扩散的语言模型)
[03:18 ] 🌊 RLFR: Extending Reinforcement Learning for LLMs with Flow Environment(RLFR:基于潜流环境扩展大模型强化学习)
[04:11 ] 🔍 Spotlight on Token Perception for Multimodal Reinforcement Learning(多模态强化学习中token感知的光束聚焦)
[04:50 ] 🎬 AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration(AVoCaDO:面向时序编排的音视频联合字幕生成器)
[05:25 ] 🌐 DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training(DiT360:混合训练视角与全景数据的高保真全景图像生成)
[05:56 ] 🧠 Demystifying Reinforcement Learning in Agentic Reasoning(揭开强化学习在智能体推理中的神秘面纱)
[06:51 ] 🧮 Making Mathematical Reasoning Adaptive(让数学推理具备自适应性)
[07:26 ] 🛡 Building a Foundational Guardrail for General Agentic Systems via Synthetic Data(面向通用智能体的基础护栏:基于合成数据的预执行安全框架)
[08:05 ] 🧠 ACADREASON: Exploring the Limits of Reasoning Models with Academic Research Problems(ACADREASON:用学术研究问题探索推理模型的极限)
[08:43 ] 🎨 InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models(InternSVG:用多模态大模型统一搞定SVG理解、编辑与生成)
[09:23 ] 🧾 FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs(FinAuditing:面向LLM评估的财务分类多文档基准)
[10:09 ] 🧠 GIR-Bench: Versatile Benchmark for Generating Images with Reasoning(GIR-Bench:面向推理图像生成的多功能基准)
<figure>
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递