FP8 E5M2 - 8-Bit Floating Point

OCP 8-bit format with wider range: 5 exponent bits, 2 mantissa bits

Bit Layout

FP8 E5M2 packs a full floating-point number into just 8 bits. The 5 exponent bits give it the same range as FP16, while the 2 mantissa bits provide very coarse precision, with each value distinguishing only 4 levels within each power of 2.

Overview

FP8 E5M2 is one of two 8-bit floating-point formats defined by the OCP (Open Compute Project) Microscaling Specification. It prioritizes dynamic range over precision, making it the FP8 variant better suited for gradient representation during training.

The OCP variant of E5M2 follows standard IEEE 754 rules for special values, supporting both infinity and NaN. This makes it a natural "tiny FP16" since it shares FP16's exponent structure (5 bits, bias 15).

FP8 E5M2 is supported on NVIDIA Hopper (H100) and Ada Lovelace GPUs, AMD MI300, and Intel Gaudi accelerators.

E5M2 vs E4M3 E5M2 has wider range but lower precision (4 values per power of 2). E4M3 has narrower range but higher precision (8 values per power of 2). Typically, E5M2 is used for gradients and E4M3 for activations/weights.

Encoding Rules

Normal Numbers

value = (-1)sign × 2(exponent - 15) × (1 + mantissa / 4)

With only 2 mantissa bits, there are exactly 4 representable values in each power-of-2 interval: 1.00, 1.25, 1.50, and 1.75 (times the power of 2). The bias of 15 is calculated as 2(e-1) - 1, where e is the number of exponent bits (5).

Subnormal Numbers

value = (-1)sign × 2-14 × (mantissa / 4)

Special Values

Interactive Value Visualizer

Click any bit to flip it, drag the slider, or enter a decimal or hex value. The graphs show how values are distributed across the encoding space.

Dynamic Range & Precision

Special Values & Bit Patterns

Format Comparison

Where FP8 E5M2 Is Used

E5M2 vs E4M3 in practice Most FP8 training workflows use both formats: E4M3 for forward-pass weights and activations (higher precision), E5M2 for backward-pass gradients (wider dynamic range). MXFP8 block scaling can eliminate the need for E5M2 entirely by using E4M3 for all tensors.