Google Brain's 16-bit format: same range as FP32, designed for deep learning
A BF16 number uses 16 bits divided into three fields:
BF16 uses the same 8-bit exponent as FP32, giving it identical dynamic range. The trade-off is a smaller 7-bit mantissa, providing less precision than FP16's 10 bits.
BF16 (Brain Floating Point 16) was developed by Google Brain for use in their TPU (Tensor Processing Unit) hardware. It has since been adopted by virtually every major ML hardware vendor including NVIDIA (Ampere and later), AMD, Intel, and ARM.
The key insight behind BF16 is that deep learning training is more sensitive to dynamic range than to precision. Neural network weights, activations, and gradients can span many orders of magnitude, and FP16's narrow range (max ~65K) often causes overflow. BF16 solves this by keeping FP32's full exponent range while truncating the mantissa.
Converting between FP32 and BF16 is trivially simple: just truncate the lower 16 bits of the FP32 representation. This makes BF16 extremely hardware-friendly.
BF16 follows the same rules as FP32, but with only 7 mantissa bits instead of 23. The bias of 127 is calculated as 2(e-1) - 1, where e is the number of exponent bits (8), identical to FP32.
Click any bit to flip it, drag the slider, or enter a decimal or hex value. The graphs show how values are distributed across the encoding space.
torch.bfloat16.BFLOAT16 as a first-class tensor data type, enabling cross-framework model export and inference in BF16.bfloat16_t type for templated GPU matrix kernels. Triton maps tl.bfloat16 to MLIR backend targets, and the MLIR arith dialect supports bf16 as a builtin type.bfloat16 as a NumPy custom dtype for JAX and TensorFlow interop, since NumPy does not yet include a native BF16 type.