IEEE 754 half-precision floating point: compact and efficient for ML inference and graphics
An FP16 number uses 16 bits divided into three fields:
The sign bit determines positive (0) or negative (1). The exponent is stored with a bias of 15. The mantissa stores 10 bits of fractional precision.
FP16, also known as half precision, was added to the IEEE 754-2008 standard as a storage and interchange format. At just 16 bits, it uses half the memory of FP32, making it attractive for applications where memory bandwidth is a bottleneck.
With only 5 exponent bits, FP16 has a much narrower range than FP32, with a maximum value of only 65,504. This means large values (like loss values in deep learning) can easily overflow to infinity. However, for values within its range, the 10 mantissa bits provide about 3.3 decimal digits of precision.
FP16 is widely used in ML inference, computer graphics (HDR textures), and as a storage format on GPUs. NVIDIA's Tensor Cores can perform FP16 matrix multiplications at 2× the throughput of FP32.
Normal FP16 numbers have a biased exponent between 1 and 30 (actual exponent -14 to +15). The bias of 15 is calculated as 2(e-1) - 1, where e is the number of exponent bits (5).
Subnormals bridge the gap between the smallest normal number and zero, enabling gradual underflow.
Click any bit to flip it, drag the slider, or enter a decimal or hex value. The graphs show how values are distributed across the encoding space.
.f16 / .f16x2 types in the PTX ISA). On H100, FP16 Tensor Cores deliver 1000 TFLOPS (2000 with sparsity).torch.float16 as a core dtype. The ONNX specification defines FLOAT16 = 10 as an IEEE 754 half-precision type for model interchange.half_t for templated MMA abstractions. Triton exports float16 as a first-class kernel dtype, and the MLIR NVVM dialect maps FP16 to Tensor Core MMA shapes.