INT8 - 8-Bit Signed Integer

The signed byte: key to ML quantization and inference acceleration

Bit Layout

An 8-bit integer (one byte) is the fundamental addressable unit in most computer architectures. INT8 uses two's complement for the range -128 to 127. For the unsigned variant (0–255), see UINT8.

Overview

The 8-bit integer is perhaps the most fundamental data type in computing. A single byte can represent 256 distinct values, which is sufficient for ASCII characters, pixel color channels, and increasingly, quantized neural network parameters.

In machine learning, INT8 quantization has become one of the most important techniques for deploying models efficiently. By converting FP32 weights and activations to INT8, you get 4× memory reduction and significantly faster inference on hardware with INT8 support (virtually all modern CPUs and GPUs).

INT8 in ML Quantization

ML quantization maps floating-point values to integers using a scale factor and optional zero point:

quantized_value = round(float_value / scale) + zero_point

The scale and zero point are chosen to map the typical range of activations or weights to the INT8 range. This introduces quantization error but typically has minimal impact on model accuracy for well-calibrated models.

Why INT8 is so popular for inference INT8 matrix multiplications are 2-4× faster than FP32 on modern CPUs (via VNNI/AMX instructions) and GPUs. Combined with 4× memory savings, INT8 quantization often gives 2-4× end-to-end speedup with less than 1% accuracy loss on well-quantized models.

Range & Properties

Key Bit Patterns

Interactive Bit Visualizer

Click any of the 8 bits to flip them. With only 256 possible values, you can explore the entire format.

Format Comparison

Where INT8 Is Used