Causal inference for credit risk: why prediction alone isn't enough

There's a pattern I've seen repeatedly in financial ML: a model achieves excellent predictive performance — AUC above 0.80, stable on holdout — and the team ships it. Then, six months later, someone asks "but why is the model denying more applicants from this postal code?" and nobody has a good answer. Prediction and causation are different things, and conflating them is expensive in credit risk specifically. When you train a credit risk model, you're typically predicting P(default | features)..