Abstract Antigen receptor numbering allows delineation of antigen-binding regions of antibodies and T cell receptors, from sequence alone. Numbering is currently achieved by aligning to a reference set. This approach may result in different numbering depending on reference set used or fail on sequences from rare species or formats. We present a method (ANARCII) which requires no alignment step and is based on a Seq2Seq language model. ANARCII improves upon existing methods through more consisten
ANARCII enables alignment-free antigen receptor numbering using a generalised language model
Alexander Greenshields‐Watson·Charlotte M. Deane·Sarah A. Robinson·Ben H. Williams·G. Gordon·Henriette Capel·Yushi Li·Fabian C. Spoendlin·Broncio Aguilar-Sanjuan·Fergus Boyles·Parth Agarwal
