ADeLe: Predicting and explaining AI performance across tasks
Lexin Zhou, Xing Xie
At a glance
- AI benchmarks report performance on specific tasks but provide limited insight into underlying capabilities; ADeLe evaluates models by scoring both tasks and models across 18 core abilities, enabling direct comparison between task demands and model capabilities.
- Using these ability scores, the method predicts performance on new tasks with ~88% accuracy, including for models such as GPT-4o and Llama-3.1.
- It builds ability profiles and identifies where models are likely to...
