LinkBERT: Improving Language Model Training with Document Link

Language Model Pretraining Language models (LMs), like BERT 1 and the GPT series 2 , achieve remarkable performance on many natural language processing (NLP) tasks. They are now the foundation of today’s NLP systems. 3 These models serve important roles in products and tools that we use every day, such as search engines like Google 4 and personal assistants like Alexa 5 . These LMs are powerful because they can be pretrained via self-supervised learning on massive amounts of text data on the web