ChatGPT and Gemini Can't Do Real Physics Research

A rather interesting post on AI's ability to actually do real physics research that beginning graduate students are expected to do. More than 50 physicists from over 30 institutions built the "CritPt" benchmark ..... The benchmark asks models to solve original, unpublished research problems that resemble the work of a capable graduate student starting an independent project. Google's "Gemini 3 Pro Preview" reached just 9.1 percent accuracy while using 10 percent fewer tokens than OpenAI's...