2025.11.27 | 俄语多模态评测补空白;潜协作提速14%
Description
本期的 15 篇论文如下:
[00:22 ] 🔍 Multimodal Evaluation of Russian-language Architectures(俄语多模态架构的评估框架)
[01:15 ] 🧠 Latent Collaboration in Multi-Agent Systems(多智能体系统中的潜在协作)
[01:47 ] 🌍 Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation(Inferix:基于块扩散的新一代世界模拟推理引擎)
[02:18 ] 🎭 Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy(和谐:通过跨任务协同实现音频与视频生成的统一)
[03:10 ] 📄 NVIDIA Nemotron Parse 1.1(英伟达Nemotron解析1.1)
[03:46 ] 🧠 Monet: Reasoning in Latent Visual Space Beyond Images and Language(Monet:超越图像与语言的潜在视觉空间推理)
[04:25 ] ⚡ Terminal Velocity Matching(终端速度匹配)
[05:03 ] 📊 Revisiting Generalization Across Difficulty Levels: It's Not So Easy(重新审视跨难度级别的泛化能力:并非易事)
[05:42 ] 🤖 MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots(MobileVLA-R1:强化移动机器人的视觉-语言-动作能力)
[06:25 ] ⚡ Image-Free Timestep Distillation via Continuous-Time Consistency with Trajectory-Sampled Pairs(基于轨迹采样对的连续时间一致性图像自由时间步蒸馏)
[06:59 ] 🎮 UniGame: Turning a Unified Multimodal Model Into Its Own Adversary(UniGame:将统一多模态模型转化为其自身的对抗者)
[07:47 ] 🧩 SPHINX: A Synthetic Environment for Visual Perception and Reasoning(SPHINX:用于视觉感知与推理的合成环境)
[08:33 ] ⚡ Block Cascading: Training Free Acceleration of Block-Causal Video Models(块级联:免训练的块因果视频模型加速)
[09:12 ] 🏙 RAISECity: A Multimodal Agent Framework for Reality-Aligned 3D World Generation at City-Scale(RAISECity:面向城市尺度的现实对齐三维世界生成多模态智能体框架)
[09:58 ] 📊 I-GLIDE: Input Groups for Latent Health Indicators in Degradation Estimation(I-GLIDE:基于输入组的退化估计潜在健康指标)
<figure>
</figure>【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递





