Language & LLMs

What Is GPTQ?

GPTQ is a quantization technique that reduces the precision of a trained language model's weights, typically to a few bits, to shrink its size and speed up inference. It applies quantization layer by layer while minimizing the error introduced. This lets large models run on more limited hardware.

What Is GPTQ?

Related topics

Further reading