Dot Product in the Attention Mechanism
Eran Raviv
The dot product of two embedding vectors and with dimension is defined as
Hardly the first thing that jumps to mind when thinking about a “similarity score”. Indeed, the result of a dot product is a single numbers (a scalar), with no predefined range (e.g. not between zero and one). So, it’s hard to quantify whether a particular score is high/low on its own. Still, deep learning Transformer family of models rely heavily on the dot product in the attention mechanism; to weigh the importance of...
Tags
