AsgardBench: A benchmark for visually grounded interactive planning

At a glance

To successfully complete tasks, embodied AI agents must ground and update their plans based on visual feedback.
AsgardBench isolates whether agents can use visual observations to revise their plans as tasks unfold.
Spanning 108 controlled task instances across 12 task types, the benchmark requires agents to adapt their plans based on what they observe.
Because objects can be in different positions and states (e.g., clean or dirty), the same instruction can require different..