Sub-byte signed integer: extreme quantization for LLM inference and edge deployment
A 4-bit integer (also called a nibble) stores one of 16 possible values. For INT4, two's complement gives the range -8 to 7. For the unsigned variant (0–15), see UINT4.
4-bit integers are a sub-byte format, meaning two INT4 values can be packed into a single byte. While not a standard hardware data type on most processors, INT4 has become critically important in machine learning for extreme quantization of large language models (LLMs).
With only 16 possible values, INT4 provides 8× compression over FP32 and 2× over INT8. This makes it possible to run large models (7B, 13B, 70B parameters) on consumer GPUs and mobile devices that would otherwise require enterprise hardware.
Techniques like GPTQ, AWQ, and GGML/GGUF Q4 quantize LLM weights to 4 bits. The typical approach:
weight = int4_value × scaleWith only 16 possible bit patterns, here is the complete set:
With only 4 bits, you can click through every possible value. Try flipping the sign bit (leftmost) to see two's complement in action.
.s4 packed types for signed 4-bit MMA instructions (m16n8k32, m16n8k64 shapes on Turing/Ampere). A100 supports sparse INT4 Tensor Core acceleration for inference.INT4 with a packing rule of two 4-bit elements per byte (first element in LSB). CUTLASS packs eight int4b_t elements into a single 32-bit GPU register.int4 as a narrow signed integer NumPy extension (4-bit, stored in a byte, with defined cast behavior).