At a glance

  • To successfully complete tasks, embodied AI agents must ground and update their plans based on visual feedback.
  • AsgardBench isolates whether agents can use visual observations to revise their plans as tasks unfold.
  • Spanning 108 controlled task instances across 12 task types, the benchmark requires agents to adapt their plans based on what they observe.
  • Because objects can be in different positions and states (e.g., clean or dirty), the same instruction can require different..