FP4 E2M1 (OCP) | Floating Point Format Guide

Bit Layout

FP4 E2M1 packs a floating-point number into just 4 bits. With 1 sign bit, 2 exponent bits, and just 1 mantissa bit, it can represent only 16 distinct values (8 positive, 8 negative including ±0).

Overview

FP4 E2M1 is defined in the OCP Microscaling Specification as the smallest element format for microscaling blocks. At 4 bits per element, it achieves 8× compression compared to FP32.

With only 1 mantissa bit, each power-of-2 interval contains exactly 2 values: X.0 and X.5 (where X is the integer part with the implicit leading 1). For example, between 1 and 2, the only representable values are 1.0 and 1.5.

Like other OCP microscaling formats, FP4 E2M1 does not support infinity or NaN. All 16 bit patterns map to finite numbers. It's always used with a shared block exponent that extends its effective range.

Only 16 values You can memorize the entire format! The 8 non-negative values are: 0, 0.5, 1.0, 1.5, 2.0, 3.0, 4.0, 6.0. Negative versions mirror these. That's all there is.

Encoding Rules

Normal Numbers

value = (-1)^sign × 2^{(exponent - 1)} × (1 + mantissa / 2)

With 1 mantissa bit, there are only 2 values per power-of-2 interval: 1.0 and 1.5 (times the power of 2). The bias of 1 is calculated as 2^(e-1) - 1, where e is the number of exponent bits (2).

Subnormal Numbers

value = (-1)^sign × 2⁰ × (mantissa / 2)

The only subnormal value is 0.5 (mantissa = 1).

Complete Value Table

Here are all 16 values that FP4 E2M1 can represent:

Interactive Value Visualizer

Click any bit to flip it, drag the slider, or enter a decimal or hex value. The graphs show how values are distributed across the encoding space.

Decimal:

Hex:

Dynamic Range & Precision

All Representable Values

Since there are only 16 possible bit patterns, here is the complete enumeration:

Format Comparison

Where FP4 E2M1 Is Used

OCP Microscaling (MXFP4): The OCP MX Spec v1.0 defines E2M1 as the element type in MXFP4 blocks (32 elements, E8M0 shared scale), achieving 8× compression over FP32.
NVIDIA Blackwell (NVFP4): Blackwell GPUs add native FP4 Tensor Core support. NVIDIA's Transformer Engine implements an NVFP4 training recipe using 16-element blocks with E4M3 scales and stochastic rounding.
NVIDIA CUDA: The CUDA Math API defines __nv_fp4_e2m1 with constructors and conversion operators. CUTLASS provides both mx_float4_t (OCP, 32-element blocks) and nv_float4_t (NVIDIA, 16-element blocks).
ML frameworks: A PyTorch RFC proposes torch.float4_e2m1f_x2 (2 packed values per byte) with E8M0 block scaling. The ONNX proto defines FLOAT4E2M1 with LSB-first packing.
AMD and compiler support: AMD Quark provides OCP_MXFP4Spec for MXFP4 quantization on ROCm. The MLIR AMDGPU dialect supports f4E2M1FN in WMMA instructions on gfx12/RDNA4.
Python libraries: The ml_dtypes library provides float4_e2m1fn as a NumPy custom dtype (possible values: 0, 0.5, 1, 1.5, 2, 3, 4, 6).