Loading [MathJax]/jax/output/CommonHTML/jax.js

Computing floating-point approximation of log(1+x)

log1p / log1pf standard functions are parts in any math library. Those functions respectively computes a double precision and a single precision approximation of the log(1+x) of their double precision (resp. single precision) input x. Such functions exist in standard C library, also in c++ ([1]), python ([2]) and numerous other bindings.

log1p is similar (in essence) to expm1: it offers a more accurate implementation of a composition of logarithm (resp. exponential) and a linear function regularly used together. But it exhibits new challenges: the implementation must be very accurate around 0 to ensure expected numerical properties (e.g. faithful rounding).

Let us first study the possible approximation on two easy intervals before considering a more generic approximation.

Approximation log(1+x) around 0

In an interval close to 0, let us say [ϵ,ϵ], log(1+x) can be accurately approximated by a small polynomial. This polynomial can be obtain with Remez algorithm for example.


Using the following piece of code (relying on pythonsollya), we can see that only a degree 6 polynomial is required to get an approximation of log(1+x)x accurate to 53 bits over [27,27]. log(1+x)x is targeted rather than log(1+x) because the later is very close to x around 0, thus dividing by x allows to search for a polynomial with a first non-zero coefficient.


1
2
3
>>> import sollya
>>> sollya.guessdegree(sollya.log1p(sollya.x)/sollya.x, sollya.Interval(-2**-7, 2**-7), 2**-53)
[6;6]

Approximation log(1+x) for values lesser than -0.5

On ]1,0.5], it is easy to see (using Sterbeinz's lemma) that 1+x is exact, thus one can fallback to (log((1+x))) to implement log1p(x) accurately, as (1+x)=1+x.

Generic appoximation

The floating-point addition (1+x) is not exact for most value of x. In fact for x between 2 and  2p1 it is only inaccurate for a few p-bit values matching the pattern 2k1+r, with 2k<p1, r<1

But we will discard this fact and focus on a more generic approximation, using the floating-point decomposition of x. We will assume the availability of a fast reciprocal operation fast_rcp which returns an approximation of the reciprocal accurate to f bits. We will use the standard decomposition x=m.2e, where m is x's (signed) mantissa and e is its exponent.

1+x=1+m.2e=2e.2e+m.2e=2e×(m+2e) (1)

log(1+x)=e×log(2)+log(m+2e) (2)

Let us rename t=m+2e and let us consider r=fast_rcp(t), which means r1m+2e

m+2e=(m+2e)×rr (3)
log(m+2e)=log(t.r)log(r) (4)

r×2e is exact, because it is a multiplication by a power of 2.

log(1+x)=e×log(2)+log(t.r)log(r)  (5)

We would like to tabulate log(r), as well as log(2) and log(t.r) can be approximated by a polynomial.
As r1t then r.t is close to one, with an error corresponding to the error of the approximation t: |1r.t|<2f

However as e can become big, t range is quite large. It is not possible to limit the table for r to a few hundreds entries. We suggest to rewrite the equation as follows:

t=mt×2et, r=fast_rcp(mt) (6)
log(t)=log(mt×2et×rr) (7)
log(t)=et×log(2)+log(mt×r)log(r) (8)

With mt×r1 and mt[1,2[ we get r[0.5,1]. r values corresponds to a few discrete values accross [0.5,1] determined by the implementation of fast_rcp.
As those values can be known statically it becomes possible to tabulate log(r) easily. For example if  fast_rcp provides an approximation based on the first f bits of mantissa of mt, the table will have 2f entries and can be accessed by looking as the first f bits of mt's mantissa (as the mapping between those bits and r is deterministic, it is easy to tabulate log(r) for every possible value of r. This is the example for example on Intel's IA architecture, where f=12.

   
Once the different terms of Equations (8) are obtained we can reconstruct the final log(t). We have to be careful with catastrophic cancellation cases which are the big caveats to look for in floating-point approximations. In our current implementation, this is solved by using multi-word arithmetic for the final operations, before an eventual rounding to the final precision. But this could be further optimized.

Source code

The meta implementation of this method can be found here in the metalibm code generator function library. Below, two command line examples to generate single and double precision source code:


1
2
>>> python3 metalibm_functions/ml_log1p.py --precision binary32 
>>> python3 metalibm_functions/ml_log1p.py --precision binary64 


Conclusion

This method has nothing new, but listing the details here can be useful to understand how a floating-point approximation can be built. Comments are welcome.

Thank you to hugo B. for proof reading this article. This article initially published on March 19th 2019 was updated on March 30th 2019 (with typo fixes and clarifications).

References:
     The use of  fast_rcp for logarithm implementation was first described in P. Markstein's book IA-64 and Elementary Functions: Speed and Precision.

No comments:

Post a Comment