OCP 8-bit format with wider range: 5 exponent bits, 2 mantissa bits
FP8 E5M2 packs a full floating-point number into just 8 bits. The 5 exponent bits give it the same range as FP16, while the 2 mantissa bits provide very coarse precision, with each value distinguishing only 4 levels within each power of 2.
FP8 E5M2 is one of two 8-bit floating-point formats defined by the OCP (Open Compute Project) Microscaling Specification. It prioritizes dynamic range over precision, making it the FP8 variant better suited for gradient representation during training.
The OCP variant of E5M2 follows standard IEEE 754 rules for special values, supporting both infinity and NaN. This makes it a natural "tiny FP16" since it shares FP16's exponent structure (5 bits, bias 15).
FP8 E5M2 is supported on NVIDIA Hopper (H100) and Ada Lovelace GPUs, AMD MI300, and Intel Gaudi accelerators.
With only 2 mantissa bits, there are exactly 4 representable values in each power-of-2 interval: 1.00, 1.25, 1.50, and 1.75 (times the power of 2). The bias of 15 is calculated as 2(e-1) - 1, where e is the number of exponent bits (5).
Click any bit to flip it, drag the slider, or enter a decimal or hex value. The graphs show how values are distributed across the encoding space.
torch.float8_e5m2 as a native dtype (supports NaN/inf, follows IEEE 754). The ONNX specification defines FLOAT8E5M2 = 19 with a note: "mostly used for gradients."float_e5m2_t with inline PTX conversion instructions. Triton exports float8e5 as a kernel dtype, and the MLIR AMDGPU dialect supports E5M2 via dot4.f32.bf8 instructions on gfx11+.float8_e5m2 (and float8_e5m2fnuz) as NumPy custom dtype extensions for use in JAX and TensorFlow pipelines.