Advancing Emerging Optimizers for Accelerated LLM Training with NVIDIA Megatron

Hao Wu
Higher-order optimization algorithms such as Shampoo have been effectively applied in neural network training for at least a decade. These methods have achieved...