ai · 9 min read Positional Encoding and Sampling: How the Transformer Finds Position and Picks Its Next Word
Attention cannot tell 'the dog bit the man' from 'the man bit the dog.' Positional encoding fixes that. Then sampling decides what word the model actually says.