AI Unraveling Human Intentions
Description
In this episode, we dive deep into the world of AI and how it can better understand and assist humans in everyday tasks. Our discussion focuses on the limitations of current AI systems when it comes to following natural language instructions, particularly in collaborative environments where human intentions often remain implicit. We introduce FISER (Follow Instructions with Social and Embodied Reasoning), a groundbreaking framework designed to bridge this gap by allowing AI to infer human goals and intentions through social reasoning. We explore the innovative use of Transformer-based models to enhance collaborative AI systems and discuss the results of testing FISER on the HandMeThat benchmark, where it achieves state-of-the-art performance. Tune in to learn how this new approach could revolutionize the way AI interacts with the human world, moving beyond literal commands and into the realm of shared understanding.
Original paper:
Wan, Y., Wu, Y., Wang, Y., Mao, J., & Jaques, N. (2024). Infer Human’s Intentions Before Following Natural Language Instructions. https://arxiv.org/abs/2409.18073