Language & LLMs

What Is Flash Attention?

Flash attention is an efficient algorithm for computing the attention operation in transformers. It reorganizes the computation to reduce reads and writes to memory, speeding up training and inference. It produces the same results as standard attention while using less memory, especially for long sequences.

What Is Flash Attention?

Related topics

Further reading