TF32 (TensorFloat-32) | Floating Point Format Guide

Bit Layout

A TF32 number uses 19 bits divided into three fields:

TF32 combines the 8-bit exponent of FP32/BF16 with the 10-bit mantissa of FP16, giving it FP32's range with FP16's precision.

Overview

TF32 (TensorFloat-32) was introduced by NVIDIA with the Ampere architecture (A100 GPU) in 2020. Despite the "32" in its name, TF32 is actually a 19-bit format. The name reflects that it is used as a drop-in replacement for FP32 in tensor operations.

TF32 is unique because it's not a storage format: data is stored in FP32 memory layout, and TF32 is only used internally by Tensor Cores during matrix multiply-accumulate operations. The GPU automatically truncates FP32 inputs to TF32 precision (10 mantissa bits) before computation, then accumulates results in FP32.

This gives up to 8× speedup over FP32 matrix math on A100, with negligible accuracy loss for most deep learning workloads. On NVIDIA GPUs, TF32 is enabled by default for torch.matmul and torch.nn.Linear.

Not a storage format You'll never see TF32 tensors in memory. TF32 is a compute mode: the GPU truncates FP32 inputs on-the-fly during Tensor Core operations. Inputs and outputs remain in FP32 format.

Encoding Rules

Normal Numbers

value = (-1)^sign × 2^{(exponent - 127)} × (1 + mantissa / 2¹⁰)

TF32 follows FP32's rules with the exponent, but with only 10 mantissa bits (like FP16). The bias of 127 is calculated as 2^(e-1) - 1, where e is the number of exponent bits (8), identical to FP32 and BF16.

Subnormal Numbers

value = (-1)^sign × 2^-126 × (0 + mantissa / 2¹⁰)

Special Values

Zero: Exponent = 0, Mantissa = 0.
Infinity: Exponent = 255, Mantissa = 0.
NaN: Exponent = 255, Mantissa ≠ 0.

Interactive Value Visualizer

Click any bit to flip it, drag the slider, or enter a decimal or hex value. The graphs show how values are distributed across the encoding space.

Decimal:

Hex:

Dynamic Range & Precision

Special Values & Bit Patterns

Format Comparison

Where TF32 Is Used

NVIDIA Tensor Cores (Ampere+): The Ampere Tuning Guide defines TF32 as the default math mode for A100 and later GPUs. Tensor Cores automatically truncate FP32 inputs to 19-bit TF32, delivering up to 10× speedup over V100 FP32 (20× with sparsity) while reading and writing standard FP32 tensors.
PyTorch and cuBLAS: TF32 matmul is enabled by default in PyTorch 1.12+ via torch.backends.cuda.matmul.allow_tf32. The cuBLASDx framework accepts the proprietary cublasdx::tfloat32_t type for explicit TF32 matrix kernels.
Kernel libraries: CUTLASS defines tfloat32_t as a fundamental numeric type, and the MLIR NVVM dialect supports TF32 MMA fragment layouts for m16n8k4 and m16n8k8 shapes.
PTX ISA: The PTX instruction set documents .tf32 as an alternate floating-point data format with dedicated MMA instruction encodings.

Not a general-purpose format TF32 is only used internally by NVIDIA Tensor Cores during specific operations (matmul, convolutions). You cannot store tensors in TF32 or select it as a dtype in frameworks. If you need a 16-bit storage format with similar precision, use FP16 (same 10-bit mantissa) instead.