Abstract The UN Agenda 2030 motivates the development of scalable methods and tools for assessing the contribution of research and higher education institutions to sustainable development. Current research efforts suffer from the lack of rigorous evaluation benchmarks. This paper describes a collaborative work with sustainability experts aimed at constructing AlmaSDG, the first pilot dataset of scientific articles labeled by multiple annotators along all SDGs. AlmaSDG is validated using inter-an