Deep learning models for Kellgren-Lawrence (KL) grading often report optimistic performance due to data leakage and fail to generalize across institutions because of domain shift. To address this reproducibility crisis, we introduce KL-FuseNet, a multitask architecture fusing global (ConvNeXt-Base) and local (ResNet-50) features to predict ordinal grades, label distributions, and binary severity (KL≥2). Using strict patient-wise stratified splits on an internal osteoarthritis initiative dataset
