What Happened BigFinanceBench (928 expert-authored tasks) and Hedge-Bench (102 real hedge-fund analyst tasks) dropped simultaneously, giving the market its first rigorous, rubric-graded measurement of where AI agents actually stand. Best-in-class models hit 58.8% on BigFinanceBench — and below 16% on the harder hedge-fund tasks. Both benchmarks grade the derivation , not just the final answer, which makes the results harder to game and more credible to institutional buyers. Who Gets Hit Positive