New method description
The method we use is based on the end of a Newton-Raphson division iteration [1, 2]. The iteration starts with an accurate approximation r of 1y and a0=x×∘(1y)
the error of a0 is ϵ0,
δ0=|xy−∘(x×r)|
δ0=|xy−x×1y(1+ϵr)(1+ϵmult)|
δ0=|xy×(ϵr+ϵmult+ϵr.ϵmult)|
ϵ0=δ0xy≤|ϵr+ϵmult+ϵr.ϵmult|
r is assumed to be a correctly rounded approximation of 1y and a0 is also implemented as a correctly rounded multiplication, so ϵmult and ϵr are bounded by 2−53 which means:
δ0≤|2−52+2−106|: a0's accuracy is a little greater than one ulp. Our goal is to get a final accuracy of less than one half of one ulp (i.e. a correctly rounded result).
Our method is based on the following (standard) iteration:
e1=x−y×a0
a1=a0+e1×r
A piece of pythonsollya code available at the end of this article can help measure the accuracy of this method in a few random test cases.
Without special care this method is not valid for all inputs:
- r overflows if y<=2−127
- computing r raises DivByZero even if x is not a canonical number
Special cases management
Once the numerical behavior is proven we can focus on the behavior on special cases.
x | y | 1y | xy | New method | ||
---|---|---|---|---|---|---|
a0 | e1 | a1 | ||||
0 | 0 | ∞ DBZ | qNaN IO | qNaN IO | qNaN | qNaN |
Num | Num | 0 | 0 | 0 | 0 | |
∞ | 0 | 0 | 0 | 0 | 0 | |
qNaN | qNaN | qNaN | qNaN | qNaN | qNaN | |
∞ | 0 | ∞ DBZ | ∞ | ∞ | qNaN IO | qNaN |
Num | Num | ∞ | ∞ | qNaN IO | qNaN IO | |
∞ | 0 | 0 | 0 | 0 | 0 | |
qNaN | qNaN | qNaN | qNaN | qNaN | qNaN | |
Num | 0 | ∞ DBZ | ∞ DBZ | ∞ | qNaN IO | qNaN |
Num | Num (A) | Num (A) | Num | Num | Num | |
∞ | 0 | 0 | 0 | 0 | 0 | |
qNaN | qNaN | qNaN | qNaN | qNaN | qNaN | |
NaN | 0 | ∞ DBZ | ∞ DBZ | qNaN | qNaN | qNaN |
Num | Num (A) | qNaN | qNaN | qNaN | qNaN | |
∞ | 0 | qNaN | qNaN | qNaN | qNaN | |
qNaN | qNaN | qNaN | qNaN | qNaN | qNaN |
However there are a few cases (indicated in bold red) were a spurious flag can be raisen or an incorrect result returned. We have to carrefully consider the flags during each of the steps of the new methods: ∘(1y), a0, e1, a1 as those stucking flags will be accumulated during the computation.
The case (A) requires specific attention. An overflow in 1y can occur (with or without the IEEE overflow flags) while xy remains in the range of normal numbers. For example if y=2−149 (the smallest subnormal numbers) and x=2−126 (the smallest normal number). ∘RN(1y)=+∞ but ∘RN(xy)=223.
To avoid intermediary overflows or underflows or spurious exceptions we modify the method as follows:
y′,s=normalize(y) (silent)
x′,t=normalize(x) (silent)
r=∘(1y′) (silent)
−=fsdiv(x,y) (result discarded)
a0=∘(x′×r)
e1=x′−y′×a0 (silent)
a1=a0+e1×r (silent)
n,m=1,1 if′x′>y′ else 0.5,2
R=(a1×m)×(n×s×t)
Each operation annoted with (silent) does not raise IEEE754 flags nor exceptions. For a number y The normalize function returns the floating-point mantissa y′ of y and the reciprocal s of the scaling factor such that y′ and s are floating-point numbers y′=s×y, 1≤y′<2, and s is a power of 2.
If y=±0, y=±∞ or y=NaN then normalize(y)=y,1.0.
normalize does not raise any IEEE754 flags.
Assuming x and y are non-zero numbers , 1≤|x′|<2 and 1≤|y′|<2 which means 12≤|x′y′|≤2.
The floating-point seed for division function, fsdiv(x,y), is a specific instruction which raises all the specific cases IEEE flags as if it computed ∘(xy) but does not compute the actual result (nor detect and raise flags for numerical overflow nor underflow). It can been implemented easily and cheaply (and already exists in some architectures).
Managing overflow and underflow
We must ensure that no loss of precision occurs during the intermediate iteration while the final result is correctly rounded (even if subnormal).
To implement this we rely on normalized inputs x′ and y′ and on normalization factors s and t.
s=2p and t=2q, emin≤p,q≤emax (e.g emin=−149, emax=127 for fp32 format)
While a1 remains within [0.5,2], s×t may overflow or underflow.
Thus we introduced the extra scaling factors n and m which brings a1×m in [1,2[
and if f=n×m×t overflows then R overflows, and reciprocaly if it less than the the smallest normal numbers and a1 is inexact then R underflows.
Thus we introduced the extra scaling factors n and m which brings a1×m in [1,2[
and if f=n×m×t overflows then R overflows, and reciprocaly if it less than the the smallest normal numbers and a1 is inexact then R underflows.
The following code, based on pythonsollya, implements a quick and dirty test to check the method error.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | # -*- coding: utf-8 -*- import sollya import random sollya.settings.display = sollya.hexadecimal def sollya_fma(x, y, z, precision, rndmode=sollya.RN): """ implement fused multiply and add using sollya. The internal precision must be enough so that x * y is computed exactly """ return sollya.round(x * y + z, precision, rndmode) def sollya_mul(x, y, precision, rndmode=sollya.RN): """ implement multiply using sollya """ return sollya.round(x * y, precision, rndmode) def iteration(x, y, approx_div, approx_recp, precision=sollya.binary64): """ implementation of refinement iteration """ e_n = sollya_fma(y, -approx_div, x, precision) r_n = sollya_fma(approx_recp, e_n, approx_div, precision) return r_n def my_division(x, y, precision=sollya.binary64, rndmode=sollya.RN): """ full division routine """ approx_recp = sollya.round(1.0 / y, precision, rndmode) approx_div = sollya_mul(approx_recp, x, precision) NUM_ITER = 1 for i in range(NUM_ITER): approx_div = iteration(x, y, approx_div, approx_recp, precision) return approx_div NUM_TEST = 1000000 PRECISION = sollya.binary64 RNDMODE = sollya.RN num_error = 0 for i in range(NUM_TEST): input_value_x = sollya.round(random.random(), PRECISION, RNDMODE) input_value_y = sollya.round(random.random(), PRECISION, RNDMODE) expected = sollya.round(input_value_x / input_value_y, PRECISION, RNDMODE) result = my_division(input_value_x, input_value_y, PRECISION, RNDMODE) error = abs(expected - result) rel_error = abs(error / expected) if error != 0.0: num_error += 1 print("{} / {} = {}, result is {}, error = {}".format(input_value_x, input_value_y, expected, result, error)) print("{} error(s) encountered".format(num_error)) |
References:
- [1] Newton-Raphson Algorithms for Floating-Point Division Using an FMA, JM Muller et al. (pdf)
- [2] Proving the IEEE Correctness of Iterative Floating-Point Square Root, Divide, and Remainder Algorithms, M Cornea-Hassegan (pdf)
- [3] pythonsollya's gitlab (Python wrapper for the Sollya library)
No comments:
Post a Comment