CLIMB: Clustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Karsten Kreis; Shizhe Diao
Publication Advances in Neural Information Processing Systems (NeurIPS)