FP8 E4M3 (OCP) | Floating Point Format Guide

Bit Layout

FP8 E4M3 uses 4 exponent bits and 3 mantissa bits. The extra mantissa bit (compared to E5M2) doubles the number of distinct values per power-of-2 interval from 4 to 8, at the cost of a narrower exponent range.

Overview

FP8 E4M3 is the precision-focused variant of the OCP Microscaling Specification. It's designed for storing weights and activations in neural networks, where precision matters more than range.

The OCP variant of E4M3 has an important special rule: it does not support infinity. The bit pattern that would normally be infinity (max exponent, mantissa = 0) instead represents a normal number. Only the single pattern with max exponent and all-ones mantissa (0 1111 111) is reserved for NaN. This gives E4M3 a higher maximum representable value (448 vs 240 if infinity were supported).

FP8 E4M3 is the preferred format for quantized inference on NVIDIA Hopper (H100) and later GPUs. Combined with E5M2 for gradients, it enables full FP8 training pipelines.

No infinity in OCP E4M3 Unlike standard IEEE 754 formats, OCP E4M3 does not have infinity. Overflow saturates to the maximum normal value (±448) instead of producing infinity. This is intentional, as it prevents training instabilities caused by inf propagation.

Encoding Rules

Normal Numbers

value = (-1)^sign × 2^{(exponent - 7)} × (1 + mantissa / 8)

With 3 mantissa bits, there are 8 representable values per power-of-2 interval: 1.000, 1.125, 1.250, 1.375, 1.500, 1.625, 1.750, 1.875 (times the power of 2). The bias of 7 is calculated as 2^(e-1) - 1, where e is the number of exponent bits (4).

Subnormal Numbers

value = (-1)^sign × 2^-6 × (mantissa / 8)

Special Values (OCP Rules)

Zero: Exponent = 0, Mantissa = 0.
No Infinity: Overflow saturates to max normal (±448).
NaN: Only the single pattern S 1111 111 (max exponent, all-ones mantissa) is NaN. All other max-exponent patterns are normal numbers.

OCP NaN convention In standard IEEE 754, any non-zero mantissa at max exponent is NaN (giving many NaN encodings). In OCP E4M3, only the all-ones mantissa is NaN. This reclaims the other patterns as usable normal values, increasing the format's representable range.

Interactive Value Visualizer

Click any bit to flip it, drag the slider, or enter a decimal or hex value. The graphs show how values are distributed across the encoding space.

Decimal:

Hex:

Dynamic Range & Precision

Special Values & Bit Patterns

Format Comparison

Where FP8 E4M3 Is Used

NVIDIA Hopper Tensor Cores: The H100 GPU introduced FP8 Tensor Cores that deliver 2000 TFLOPS (4000 with sparsity) - doubling throughput vs FP16. E4M3 is the precision-optimized variant for forward-pass weights and activations (max ±448, no infinities).
Transformer Engine: NVIDIA's Transformer Engine FP8 primer documents E4M3 as the forward-pass format in the FP8 training recipe, using delayed scaling or MXFP8 block scaling to manage dynamic range.
ML frameworks: PyTorch registers torch.float8_e4m3fn as a native dtype. The torchao quantization library uses it directly in Float8DynamicActivationFloat8WeightConfig pipelines.
OCP Microscaling: The OCP MX Spec v1.0 defines E4M3 as the element type in MXFP8 blocks (32 elements with an E8M0 shared scale), extending effective dynamic range beyond standalone 4-exponent-bit limits.
Kernel libraries: CUTLASS defines float_e4m3_t with PTX inline asm for saturation-mode conversion. Triton exports float8e4nv as a kernel-programmable dtype.
NumPy ecosystem: The ml_dtypes library provides float8_e4m3fn (and variants float8_e4m3fnuz, float8_e4m3b11fnuz) as NumPy dtype extensions for JAX and TensorFlow. The ONNX proto defines FLOAT8E4M3FN = 17.