Language & LLMs
What Is a Transformer Decoder?
A transformer decoder produces an output sequence step by step, using masked self-attention so each position can only attend to earlier tokens. It is the core of autoregressive language models like the GPT family. Decoders may also use cross-attention to reference encoder outputs in sequence-to-sequence models.
Further reading
Read more about transformer decoder — articles and blogs from around the web: