Provably Learning from Language Feedback
Description
This paper introduces a new formal framework called Learning from Language Feedback (LLF), which addresses the challenge of training AI agents, particularly large language models (LLMs), using rich natural language critiques and guidance instead of traditional scalar rewards. The authors formalize the LLF problem and introduce the transfer eluder dimension as a complexity measure to quantify how effectively language feedback reduces uncertainty about latent rewards, demonstrating cases where learning can be exponentially faster than reward-only methods. They propose a no-regret algorithm called HELiX that provably solves LLF problems and empirically show that a practical implementation using LLMs outperforms greedy baselines across several environments. Overall, the work establishes a theoretical foundation for designing principled interactive learning algorithms that leverage generic language feedback, positioning LLF as a broad paradigm encompassing existing reinforcement learning models.