Convergence Theory of Sharpness-Aware Minimization with Interpolating Neural Networks

We present theoretical analysis of Sharpness-Aware Minimization (SAM) applied to training loss minimization with neural networks and smooth activations. Unlike prior works on stationary points measure in standard non-convex smooth optimization settings with noise assumptions, we leverage intrinsic properties of neural network loss landscapes to establish a convergence rate of Õ<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math nota