Convergence Theory of Sharpness-Aware Minimization with Interpolating Neural Networks

We present theoretical analysis of Sharpness-Aware Minimization (SAM) applied to training loss minimization with neural networks and smooth activations. Unlike prior works on stationary points measure in standard non-convex smooth optimization settings with noise assumptions, we leverage intrinsic properties of neural network loss landscapes to establish a convergence rate of Õ<tex-math nota