Scaling laws for moral machine judgement in large language models

Abstract Autonomous systems increasingly require moral judgement capabilities, yet whether these capabilities scale predictably with model size remains unexplored. We systematically evaluate 75 large language model (LLM) configurations (0.27–1000B parameters) using the moral machine framework, measuring alignment with human preferences in life–death dilemmas. We observe a consistent power-law relationship with distance from human preferences (D) decreasing as D∝S−0.10±0.01 (R2=0.50, p<0.0