Vision & Generative Media
What Is a Vision Transformer?
A vision transformer, or ViT, splits an image into patches and processes them with a transformer using self-attention. It offers an alternative to convolutional networks for image tasks.
Further reading
Read more about vision transformer — articles and blogs from around the web: