AsgardBench: A benchmark for visually grounded interactive planning
Andrea Tupini, Lars Liden, Reuben Tan, Yu Wang, Jianfeng Gao
At a glance
- To successfully complete tasks, embodied AI agents must ground and update their plans based on visual feedback.
- AsgardBench isolates whether agents can use visual observations to revise their plans as tasks unfold.
- Spanning 108 controlled task instances across 12 task types, the benchmark requires agents to adapt their plans based on what they observe.
- Because objects can be in different positions and states (e.g., clean or dirty), the same instruction can require different..
