Fast Inverse Square Root Hack: A simplified version

Insights into a simplified version of the famous hack used in a 3D computer game from the 90s.

Single precision floating point numbers

A floating point number f is represented as \(f=s \cdot m \cdot 2^e = s \cdot m \cdot 2^{E-127}\). Here s is the sign (+, -), m is the mantissa (digit string of the number with \(1.0 \le m \lt 2.0\)), e is the exponent and E is the "biased" exponent which is actually stored in memory. As an example, \(3=+1.5 \cdot 2^1\), so s=+, m=1.5, e=1 and E=128. A single precision floating point number (or float32 for short) consists of 32bits, using 1bit for the sign s, 8bit for the biased exponent E, and 23bit for the mantissa.

float32 on bit-level Fig. 1: Bit-level representation of a float32 number. The first row shows the sizes of the bit fields while the second row shows the offsets (zero-based numbering).

An approximation for 1/√(x)

We have a floating point number (ignoring the sign bit from now on) \(x=m \cdot 2^e\) and want to compute \(\frac{1}{\sqrt{x}} = \frac{1}{\sqrt{m \cdot 2^e}} = \frac{1}{\sqrt{m}} \cdot 2^{-e/2}\). A simple approximation would be to ignore the mantissa and just care about the exponent. Then we have \(\frac{1}{\sqrt{x}} \approx 2^{-e/2}\). This approximation is correct if m=1.

Implementing the approximation using "bit-magic"

We now implement the approximation \(\frac{1}{\sqrt{x}} \approx 2^{-e/2} = 2^{-(E-127)/2} = 2^{-E/2+127/2}\) in C++. As already mentioned, only the biased exponent E is stored in memory, and the bias term -127 is assumed implicitly. We will only use one integer subtraction, one bit-shift and one bit-mask and call this procedure "bit-magic" from now on. The steps are as follows (illustrated in Fig. 2):

Interpret the bits of the float32 number as an integer to allow for integer subtraction and bit manipulation in C++
Clear (set to zero) all bits except for the exponent field
The desired value of the exponent is \(-e/2 = -E/2+127/2\)
Divide the integer representation by 2 to get \(\lfloor E/2 \rfloor\) (the brackets indicate integer division, e.g. \(\lfloor 4/2 \rfloor=2\) and \(\lfloor 3/2 \rfloor=1\))
Instead of an integer division by 2, we can also use a bitwise right-shift of the integer (and use a mask to clear the one bit shifted out of the exponent field)
float32 uses the implicit bias term, so even though in memory we have a value of \(\lfloor E/2 \rfloor\), this actually represents the exponent \(\lfloor E/2 \rfloor -127\)
We have to correct the value in the exponent field by a value X such that the following holds: \(-e/2 = -E/2+127/2 = X - \lfloor E/2 \rfloor - 127\)
Ignoring the integer division for a moment (i.e. ignore the brackets of \(\lfloor E/2 \rfloor\)), X obviously must have the value \(\frac{3 \cdot 127}{2}=190.5\)
We can't represent 190.5 as integer, so let's use 190 instead, later we analyze the error introduced by this 0.5 offset
As the exponent field starts at bit 23, we have to shift all values 23 bits to the left (e.g., 190<<23), however, as this is not important for the discussion it is ignored for now
In summary we have \(e_{M} = 190 - \lfloor E/2 \rfloor - 127\), with \(2^{e_{M}}\) the approximation computed for \(\frac{1}{\sqrt{x}}\)

illustration of the bit-magic Fig. 2: Illustration of the operations applied to the integer representation i of the float32 value.

Error of the "bit-magic"

The value produced by the "bit-magic" is \(2^{e_{M}}\) with \(e_{M} = 190 - \lfloor E/2 \rfloor - 127 = - \lfloor(e+127)/2 \rfloor + 63\). To get rid of the integer division we have to distinguish two cases: \[e_{M} = \begin{cases} -\lfloor (e+127)/2 \rfloor + 63 = -e/2 + 63 - 63 = -e/2 & e \quad even \\ -\lfloor (e+127)/2 \rfloor + 63 = -(e+127)/2 = -e/2 -63.5 + 63 = -e/2 - 0.5 & e \quad odd \end{cases} \] As can be seen, the "bit-magic" does not always match the approximation \(2^{-e/2}\) we wanted to implement, but that's still fine as we just need an initial guess for the Newton refinement. Our implementation matches the approximation for even e and is off by a factor \(2^{-0.5} = 1/\sqrt{2}\) for odd e. The approximation itself is of course only correct for a mantissa m=1. Fig. 3 shows exactly this: for odd e values (x is \(2^1=2\) or \(2^3=8\)) and m=1, the "bit-magic" is by a factor \(1/\sqrt{2}\) too small. For even e values (x is \(2^0=1\) or \(2^2=4\)) and m=1, the "bit-magic" matches the true value exactly.

bit-magic vs. actual value Fig. 3: True value of \(1/\sqrt{x}\) in green and approximation computed by "bit-magic" in red for the interval \(x \in [1, 16]\).

Further, an error is introduced if the mantissa is not exactly 1. The "bit-magic" is piece-wise constant, with the constant segments starting at odd e values, continuing over the following even e value range, and ending just before the next odd e value. For an example, see Fig. 3, where a segment starts at \(2^1=2\) and ends before \(2^3=8\). The true value is \(y_T = 1/\sqrt{x} = 1/\sqrt{(m \cdot 2^e} = 1/\sqrt{m} \cdot 2^{-e/2}\) with e fixed to the odd e value and m running from 1 to 4. The value computed by the "bit-magic" is \(y_M = 1/\sqrt{2} \cdot 2^{-e/2}\). The relative error is \(\epsilon_{rel} = (y_M-y_T)/y_T = \frac{\sqrt{m}}{\sqrt{2}} - 1\). This is exactly the behaviour which can be observed by the measured relative error as shown in Fig. 4: It starts at ~-0.29 and then goes up tp ~0.41, until it jumps back to ~-0.29 again. The relative error has a repeating pattern on a logarithmic scale.

difference between actual value and computed value Fig. 4: Relative error of the "bit-magic" (left) and the Newton refinement (right) for the interval \(x \in [1, 16]\).

Newton refinement

The "bit-magic" gives a rough estimate for the true value, but can have a relative error of up to 41%. To improve on this, a small number of Newton iterations is applied. Newton's method computes the zero of a function, i.e. it computes y such that \(f(y)=0\). We re-formulate \(y=1/\sqrt{x}\) into a zero-finding problem, that is \(f(y) = y^2-1/x = 0\). Applying the Newton update \(y_{k+1}=y_k - f(y_k)/f'(y_k)\) gives a better approximation for y than we had before, where f'(y) is the first derivative of f(y). The maximum relative error decreases fast:

0 iterations: 41%
1 iteration: 6%
2 iterations: <0.2%
3 iterations: <0.0003%

So one or two iterations should be enough for most use-cases.

C++ implementation

#include <cstdint>
#include <cstddef>

float fast_inv_sqrt(float x)
{
    float y = x; // y holds the current guess for 1/sqrt(x)
    uint32_t *i = reinterpret_cast<uint32_t *>(&y); // i points to current guess y

    const uint32_t exp_mask = 0x7F800000; // 0xFF<<23
    const uint32_t magic_number = 0x5f000000; // 190<<23

    // initial guess using magic number
    *i = magic_number - ((*i >> 1) & exp_mask);

    // refine guess using small number of Newton iterations
    const size_t num_newton_iter = 2;
    for (size_t i = 0; i < num_newton_iter; ++i)
    {
        y = (x * y * y + 1) / (2 * x * y);
    }

    return y;
}

Conclusion

Compared to the original hack, the version analyzed in this article only uses the exponent bit field of a float32 number. This gives an order of magnitude worse relative error, however, it also makes analyzing the "bit-magic" much easier. Using only two Newton iterations drives the worst-case relative error down to <0.2%. The implementation ignores the sign bit and does not work for de-normalized float32 numbers.

Harald Scheidl, 2020