Eran Raviv7/28/2025

Dot Product in the Attention Mechanism

Eran Raviv
The dot product of two embedding vectors and with dimension is defined as Hardly the first thing that jumps to mind when thinking about a “similarity score”. Indeed, the result of a dot product is a single numbers (a scalar), with no predefined range (e.g. not between zero and one). So, it’s hard to quantify whether a particular score is high/low on its own. Still, deep learning Transformer family of models rely heavily on the dot product in the attention mechanism; to weigh the importance of...