2025.10.28 | Point Transformer无标对齐长空间;代码递归统一粗细粒度
Description
本期的 15 篇论文如下:
[00:23 ] 🎼 Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations(Concerto:2D-3D联合自监督学习涌现空间表征)
[01:06 ] 🧩 ReCode: Unify Plan and Action for Universal Granularity Control(ReCode:用递归代码统一规划与行动,实现通用粒度控制)
[01:44 ] 🤖 A Survey of Data Agents: Emerging Paradigm or Overstated Hype?(数据智能体全景透视:新范式还是泡沫?)
[02:23 ] 🌾 FARMER: Flow AutoRegressive Transformer over Pixels(基于像素流自回归变换器的可逆生成模型)
[03:07 ] 🤖 VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting(VITA-E:能同时看、听、说、做的自然具身交互框架)
[03:45 ] 🎭 Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation(前瞻锚定:在音频驱动人体动画中保持角色身份)
[04:17 ] 🤖 ACG: Action Coherence Guidance for Flow-based VLA models(面向流式VLA模型的动作连贯性引导)
[04:56 ] 🔍 $\text{E}^2\text{Rank}$: Your Text Embedding can Also be an Effective and Efficient Listwise Reranker(E²Rank:你的文本嵌入也能成为高效列表级重排器)
[05:40 ] 🌐 Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences(全模态奖励模型:用自由格式偏好迈向通用奖励建模)
[06:30 ] 🔍 PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity(PixelRefer:任意粒度时空目标指代的统一框架)
[07:06 ] 🧠 Knocking-Heads Attention(敲头注意力:让多头彼此“敲一敲”)
[07:42 ] 🧩 IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction(IGGT:面向语义三维重建的实例锚定几何Transformer)
[08:30 ] 🎯 The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation(多选一最优:用max@k优化将强化学习与Best-of-N采样对齐)
[09:14 ] 🥯 LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation(LightBagel:面向统一多模态理解与生成的轻量级双重融合框架)
[09:51 ] 🧠 LimRank: Less is More for Reasoning-Intensive Information Reranking(LimRank:少即是多的推理密集型信息重排序)
<figure>
</figure>【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递






