问题描述
rootn(float_t x,int_t n)是一个函数,用于计算第n个根x 1 / n OpenCL 。当使用IEEE-754浮点数时,对于任何 n ,可以基于对基本位模式的简单操作来生成有效的低精度起始逼近,假设只有标准化的操作数 x 需要被处理。root(x,n)的二进制指数将是 x 的二进制指数的1 / n。 IEEE-754浮点数的指数字段是有偏差的。我们可以简单地将偏差指数除以 n ,然后应用一个偏移量来补偿以前的偏移量,而不是偏移指数,除以指数并重新偏置结果被忽视的偏见。此外,我们可以简单地将整个操作数 x 分开,并重新解释为一个整数,而不是将指数字段进行提取和分割。所需的偏移量是微不足道的,因为参数1会返回任何 n 的结果1。如果我们有两个帮助函数, __ int_as_float()重新解释IEEE-754 binary32 作为 int32 和 __ float_as_int(),重新解释 int32 操作数为 binary32 ,我们得到以下的低精度近似值: rootn(x,n) 以简单的方式:
pre $ root $($) (1.0f)*(1.0-1.0 / n))+ __float_as_int(x)/ n)
可以通过。一些工作的例子是:
$ p $ rootn(x,2)〜= __int_as_float(0x1fc00000 + __float_as_int(x)/ 2)// sqrt(x)
rootn(x,3)〜= __int_as_float(0x2a555556 + __float_as_int(x)/ 3)// cbrt(x)
rootn(x,-1)〜= __int_as_float(0x7f000000 - __float_as_int(x)/ 1)// rcp(x)
rootn(x,-2)〜= __int_as_float(0x5f400000 - __float_as_int(x)/ 2)// rsqrt(x)
rootn ,-3)〜= __int_as_float(0x54aaaaaa - __float_as_int(x)/ 3)// rcbrt(x)
<对于所有这些近似值,只有当 x = 2 m / code>。否则,与真实的数学结果相比,近似值将提供高估。我们可以通过略微减小偏移量来近似减半最大相对误差,从而导致低估和高估的均衡混合。这很容易通过使用区间[1,2 )中的所有浮点数作为测试用例的二元搜索最优偏移来实现。这样做,我们发现:
pre $ root $(x,2)〜= __int_as_float(0x1fbb4f2e + __float_as_int(x)/ 2)/ / max rel err = 3.47474e-2
rootn(x,3)〜= __int_as_float(0x2a51067f + __float_as_int(x)/ 3)// max rel err = 3.15547e-2
rootn(x, -1)〜= __int_as_float(0x7ef311c2__float_as_int(x)/ 1)// max rel err = 5.05103e-2
rootn(x,-2)〜= __int_as_float(0x5f37642f - __float_as_int(x)/ 2) // max rel err = 3.42128e-2
rootn(x,-3)〜= __int_as_float(0x54a232a3 - __float_as_int(x)/ 3)// max rel err = 3.42405e-2
<有些人可能会注意到,对于 rootn(x,-2)的计算,基本上是的最初部分。
基于观察原始原始偏移量与最小化最大相对误差优化的最终偏移量之间的差异,我可以制定二次纠正的启发式因此,最终的,优化的偏移值。
然而,我想知道是否有可能通过一些闭式公式来确定最佳偏移量,例如相对误差的最大绝对值max(|(approx(x,n)-x 1 / n)/ x 1 / n)被最小化 x in [1,2 )。为了便于说明,我们可以限制为 binary32 (IEEE-754单精度)数字。
我知道一般情况下,对于极大极小近似不存在封闭形式的解决方案,但我认为对于代数函数的多项式近似,如 n - 根。在这种情况下,我们有(分段)线性近似。这里有一些Octave(MATLAB)代码计算正数n的偏移量,假设下面的猜测。似乎为2和3工作,但我怀疑其中一个假设当n太大时会崩溃。
%找到正数n
%的最佳偏移量,假设下面的猜测是true
函数oint =偏移量(n)
%找到没有偏移量的最差向上误差
efrac =(1 / log(2) - 1)* n;
evec = [floor(efrac)ceil(efrac)];
rootnvec = 2。^(evec / n);
[abserr i] = max((1 + evec / n)./ rootnvec);
relerr = abserr - 1;
rootnx = rootnvec(i);
%猜想:z使得近似(z,n)= 1
%应该具有最差的向下误差
fun = @(o)relerr - o / rootnx +(1 / + o * n)^(1 / n)-1);
oreal = fzero(fun,[0 1]);
oint = round((127 *(1-1 / n) - oreal)* 2 ^ 23);
只有正数n的部分答案 - 假设我们没有调整偏移量,我们只是轻描淡写地估计一下。
让我们定义一个理想化的<$ code $ c $ x $ in $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ ,n)= 1 + floor(lg(x))/ n +((x / 2 ^ floor(lg(x))) - 1)/ n
^^^^^^^^^^^ ^ $ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
贡献
的指数重要
我们想要最大化 rootn -A(x,n)/ x ^ n)。
实验上看来,当 x 是一个功率两个。在这种情况下,有效数字项是零和 floor(lg(x))= lg(x),所以我们可以最大化
(1 + lg(x)/ n)/ x ^(1 / n)。
替换 y = lg(x)/ n ,我们可以在[0,1)中使(1 + y)/ 2 ^ y 那 n * y 是一个整数。抛开完整性条件,证明这个凹函数的最大值在 y = 1 / log(2) - 1 ,大约 0.4426950408889634 。由此可得, x 2的幂的最大值在 x = 2 ^ floor((1 / log(2) - 1)* (1 / log(2)-1)* n)。我推测其中的一个实际上是全球最大的。
在低估结束时,看起来我们希望 x 使得 rootn(x,n)的输出是 1 ,至少对于小$ ñ。希望稍后再来。
rootn (float_t x, int_t n) is a function that computes the n-th root x and is supported by some programming languages such as OpenCL. When IEEE-754 floating-point numbers are used, efficient low-accuracy starting approximations for any n can be generated based on simple manipulation of the underlying bit pattern, assuming only normalized operands x need to be processed.
The binary exponent of root (x, n) will be 1/n of the binary exponent of x. The exponent field of an IEEE-754 floating-point number is biased. Instead of un-biasing the exponent, dividing it, and re-biasing the result, we can simply divide the biased exponent by n, then apply an offset to compensate for the previously neglected bias. Furthermore, instead of extracting, then dividing, the exponent field, we can simply divide the entire operand x, re-interpreted as an integer. The required offset is trivial to find as an argument of 1 will return a result of 1 for any n.
If we have two helper functions at our disposal, __int_as_float() which reinterpretes an IEEE-754 binary32 as int32, and __float_as_int() which reinterpretes an int32 operand as binary32, we arrive at the following low-accuracy approximation to rootn (x, n) in straightforward fashion:
rootn (x, n) ~= __int_as_float((int)(__float_as_int(1.0f)*(1.0-1.0/n)) + __float_as_int(x)/n)
The integer division __float_as_int (x) / n can be reduced to a shift or multiplication plus shift by well-known optimizations of integer division by constant divisor. Some worked examples are:
rootn (x, 2) ~= __int_as_float (0x1fc00000 + __float_as_int (x) / 2) // sqrt (x) rootn (x, 3) ~= __int_as_float (0x2a555556 + __float_as_int (x) / 3) // cbrt (x) rootn (x, -1) ~= __int_as_float (0x7f000000 - __float_as_int (x) / 1) // rcp (x) rootn (x, -2) ~= __int_as_float (0x5f400000 - __float_as_int (x) / 2) // rsqrt (x) rootn (x, -3) ~= __int_as_float (0x54aaaaaa - __float_as_int (x) / 3) // rcbrt (x)
With all these approximations, the result will be exact only when x = 2 for integer m. Otherwise, the approximation will provide an overestimation compared to the true mathematical result. We can approximately halve the maximum relative error by reducing the offset slightly, leading to a balanced mix of underestimation and overestimation. This is easily accomplished by a binary search for the optimal offset that uses all floating-point numbers in the interval [1, 2) as test cases. Doing so, we find:
rootn (x, 2) ~= __int_as_float (0x1fbb4f2e + __float_as_int(x)/2) // max rel err = 3.47474e-2 rootn (x, 3) ~= __int_as_float (0x2a51067f + __float_as_int(x)/3) // max rel err = 3.15547e-2 rootn (x,-1) ~= __int_as_float (0x7ef311c2 - __float_as_int(x)/1) // max rel err = 5.05103e-2 rootn (x,-2) ~= __int_as_float (0x5f37642f - __float_as_int(x)/2) // max rel err = 3.42128e-2 rootn (x,-3) ~= __int_as_float (0x54a232a3 - __float_as_int(x)/3) // max rel err = 3.42405e-2
Some may notice that the computation for rootn (x,-2) is basically the initial portion of Quake's fast inverse square root.
Based on observing the differences between the original raw offset, and the final offset optimized to minimize the maximum relative error, I could formulate a heuristic for the secondary correction and thus the final, optimized, offset value.
However, I am wondering whether it is possible to determine the optimal offset by some closed-form formula, such that the maximum absolute value of the relative error, max (|(approx(x,n) - x) / x|), is minimized for all x in [1,2). For ease of exposition, we can restrict to binary32 (IEEE-754 single-precision) numbers.
I am aware that in general, there is no closed-form solution for minimax approximations, however I am under the impression that closed-form solutions do exist for the case of polynomial approximations to algebraic functions like n-th root. In this case we have (piecewise) linear approximation.
Here's some Octave (MATLAB) code that computes offsets for positive n assuming the conjectures below. Seems to work for 2 and 3, but I suspect that one of the assumptions breaks down when n is too large. No time to investigate right now.
% finds the best offset for positive n % assuming that the conjectures below are true function oint = offset(n) % find the worst upward error with no offset efrac = (1 / log(2) - 1) * n; evec = [floor(efrac) ceil(efrac)]; rootnvec = 2 .^ (evec / n); [abserr i] = max((1 + evec / n) ./ rootnvec); relerr = abserr - 1; rootnx = rootnvec(i); % conjecture: z such that approx(z, n) = 1 % should have the worst downward error fun = @(o) relerr - o / rootnx + (1 / (1 + o * n) ^ (1 / n) - 1); oreal = fzero(fun, [0 1]); oint = round((127 * (1 - 1 / n) - oreal) * 2 ^ 23);
Partial answer for positive n only -- I'm just going to nibble a bit by conjecturing the worst overestimate, assuming that we don't adjust the offset downward.
Let's define an idealized version of the approximation for x in [1, 2^n).
rootn-A(x, n) = 1 + floor(lg(x))/n + ((x/2^floor(lg(x))) - 1) / n ^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ contribution of contribution of the the exponent significand
We want to maximize rootn-A(x, n) / x^(1/n).
It seems experimentally that the maximum occurs when x is a power of two. In this case, the significand term is zero and floor(lg(x)) = lg(x), so we can maximize
(1 + lg(x)/n) / x^(1/n).
Substitute y = lg(x)/n, and we can maximize (1 + y) / 2^y for y in [0, 1) such that n*y is an integer. Dropping the integrality condition, it is a calculus exercise to show that the max of this concave function is at y = 1/log(2) - 1, about 0.4426950408889634. It follows that the maximum for x a power of two is at either x = 2^floor((1/log(2) - 1) * n) or x = 2^ceil((1/log(2) - 1) * n). I conjecture that one of these is in fact the global maximum.
On the underestimation end, it appears that we want x such that the output of rootn(x, n) is 1, at least for small n. More to come later hopefully.
这篇关于优化了对`rootn(x,n)`的低精度近似的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!