Language & LLMs

What Is GPTQ?

GPTQ is a quantization technique that reduces the precision of a trained language model's weights, typically to a few bits, to shrink its size and speed up inference. It applies quantization layer by layer while minimizing the error introduced. This lets large models run on more limited hardware.

Further reading

Read more about GPTQ — articles and blogs from around the web: