问题描述
当 strictfp
被设置时,如何在不使用SSE2的情况下针对英特尔处理器的Java运行时处理浮点非规范?
即使将387 FPU设置为53位精度,它仍然保持超大的指数范围:
- 强制检测每个中间结果中的下溢/溢出,并且
- 使得难以避免双重正常化的结果。
策略包括重新计算导致带有模拟浮点的非正规值操作的操作,或者沿着为OCaml配备63位浮点,借用一点指数,以避免双舍入。
在任何情况下,我看不出每个浮点计算至少有一个条件分支,除非操作可以stati应确定不下溢/溢出。例外(上溢/下溢)情况如何处理是我的问题的一部分,但这不能与表示的问题分开(永久指数偏移策略似乎意味着只有溢出需要检查,例如)。
从我看,从一个非常平凡的测试用例,像JVM往返每一个 double
通过内存计算得到它想要的舍入。它也似乎用一些魔术常量做一些奇怪的事情。以下是我为一个简单的计算2 ^ n天真程序做了什么:
0xb1e444b0:fld1
0xb1e444b2 :jmp 0xb1e444dd; * iload
; - fptest :: calc @ 9(第6行)
0xb1e444b7:nop
0xb1e444b8:fldt 0xb523a2c8; {external_word}
0xb1e444be:fmulp%st,%st(1)
0xb1e444c0:fmull 0xb1e44490; {section_word}
0xb1e444c6:fldt 0xb523a2bc; {external_word}
0xb1e444cc:fmulp%st,%st(1)
0xb1e444ce:fstpl 0x10(%esp)
0xb1e444d2:inc%esi; OopMap {off = 51}
; * goto
; - fptest :: calc @ 22(第6行)
0xb1e444d3:test%eax,0xb3f8d100; {poll}
0xb1e444d9:fldl 0x10(%esp); * goto
; - fptest :: calc @ 22(第6行)
0xb1e444dd:cmp%ecx,%esi
0xb1e444df:jl 0xb1e444b8; * if_icmpge
; - fptest :: calc @ 12(第6行)
我相信 我推测看似没有意义的 这里是Java代码: p> 进一步挖掘这些操作的代码是纯在 我没有看到加法和减法,但我敢打赌他们只是在53位模式下对FPU进行加/减运算,然后通过内存对结果进行往返运算。我有点好奇,是否有一个棘手的溢出情况,他们得到错误的,但我不够好奇,以查明。 How does(did) a Java runtime targeting an Intel processor without SSE2 deal with floating-point denormals, when Even when the 387 FPU is set for 53-bit precision, it keeps an oversized exponent range that: Strategies include re-computing the operation that resulted in a denormal value with emulated floating-point, or a permanent exponent offset along the lines of this technique to equip OCaml with 63-bit floats, borrowing a bit from the exponent in order to avoid double-rounding. In any case, I see no way to avoid at least one conditional branch for each floating-point computation, unless the operation can statically be determined not to underflow/overflow. How exceptional (overflow/underflow) cases are dealt with is part of my question, but this cannot be separated from the question of the representation (the permanent exponent offset strategy seems to mean that only overflows need to be checked for, for instance). It looks to me, from a very trivial test case, like the JVM round-trips every I believe I'd speculate that the seemingly-pointless Here's the Java code: Digging further, the code for these operations is in plain sight in the OpenJDK code in I see nothing for addition and subtraction, but I'd bet they just do an add/subtract with the FPU in 53-bit mode and then round-trip the result through memory. I'm a little curious whether there's a tricky overflow case that they get wrong, but I'm not curious enough to find out. 这篇关于Java运行时如何针对pre-SSE2处理器实现浮点基本操作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 0xb523a2c8
和 0xb523a2bc
是 _fpu_subnormal_bias1
和 _fpu_subnormal_bias2
从热点源代码。 _fpu_subnormal_bias1
看起来是 0x03ff8000000000000000
和 _fpu_subnormal_bias2
code> 0x7bff8000000000000000 。 _fpu_subnormal_bias1
具有将最小正常 double
缩放到最小正常 long double
;如果FPU轮到53位,就会发生正确的事情。
test
指令是存在的,这样就可以通过在GC需要的时候标记该页面不可读取来中断线程。
import java.io. *;
public strictfp class fptest {
public static double calc(int k){
double a = 2.0;
double b = 1.0;
for(int i = 0; i b * = a;
}
return b;
public static double intest(){
double d = 0;
for(int i = 0; i return d;
public static void main(String [] args)throws Exception {
for(int i = 0; i System.out.println( INTEST());
$ b $ p
$ b hotspot / src / cpu / x86 / vm / x86_63.ad
中的OpenJDK代码中。相关片段:
指示strictfp_mulD_reg(regDPR1 dst,regnotDPR1 src)%{
谓词(UseSSE< = 1& & Compile :: current() - > has_method()&& Compile :: current()
- > method() - > is_strict());
匹配(Set dst(MulD dst src));
ins_cost(1); //为所有严格的FP双重乘法选择此指令
格式%{FLD StubRoutines :: _ fpu_subnormal_bias1\\\
\t
DMULp $ dst,ST \\\\ t
FLD $ src\\\
\t
DMULp $ dst,ST\\\
\t
FLD StubRoutines :: _ fpu_subnormal_bias2\\\
\t
DMULp $ dst,ST \\\\%}
操作码(0xDE,0x1); / * DE C8 + i或DE / 1 * /
ins_encode(strictfp_bias1(dst),
Push_Reg_D(src),
OpcP,RegOpc(dst),
strictfp_bias2(dst ));
ins_pipe(fpu_reg_reg);
%}
指示strictfp_divD_reg(regDPR1 dst,regnotDPR1 src)%{
谓词(UseSSE< = 1);
匹配(Set dst(DivD dst src));
predicate(UseSSE< = 1&& Compile :: current() - > has_method()&&&& amp; Compile :: current()
- > method() - > is_strict ());
ins_cost(01);
格式%{FLD StubRoutines :: _ fpu_subnormal_bias1\\\
\t
DMULp $ dst,ST \\\
\t
FLD $ src\\ \\ n \ t
FDIVp $ dst,ST \\\\
FLD StubRoutines :: _ fpu_subnormal_bias2\\\
\t
DMULp $ dst,ST \\\
\t%}
操作码(0xDE,0x7); / * DE F8 + i或DE / 7 * /
ins_encode(strictfp_bias1(dst),
Push_Reg_D(src),
OpcP,RegOpc(dst),
strictfp_bias2(dst ));
ins_pipe(fpu_reg_reg);
%}
strictfp
is set?double
computation through memory to get the rounding it wants. It also seems to do something weird with a couple of magic constants. Here's what it did for me for a simple "compute 2^n naively" program:0xb1e444b0: fld1
0xb1e444b2: jmp 0xb1e444dd ;*iload
; - fptest::calc@9 (line 6)
0xb1e444b7: nop
0xb1e444b8: fldt 0xb523a2c8 ; {external_word}
0xb1e444be: fmulp %st,%st(1)
0xb1e444c0: fmull 0xb1e44490 ; {section_word}
0xb1e444c6: fldt 0xb523a2bc ; {external_word}
0xb1e444cc: fmulp %st,%st(1)
0xb1e444ce: fstpl 0x10(%esp)
0xb1e444d2: inc %esi ; OopMap{off=51}
;*goto
; - fptest::calc@22 (line 6)
0xb1e444d3: test %eax,0xb3f8d100 ; {poll}
0xb1e444d9: fldl 0x10(%esp) ;*goto
; - fptest::calc@22 (line 6)
0xb1e444dd: cmp %ecx,%esi
0xb1e444df: jl 0xb1e444b8 ;*if_icmpge
; - fptest::calc@12 (line 6)
0xb523a2c8
and 0xb523a2bc
are _fpu_subnormal_bias1
and _fpu_subnormal_bias2
from the hotspot source code. _fpu_subnormal_bias1
looks to be 0x03ff8000000000000000
and _fpu_subnormal_bias2
looks to be 0x7bff8000000000000000
. _fpu_subnormal_bias1
has the effect of scaling the smallest normal double
to the smallest normal long double
; if the FPU rounds to 53 bits, the "right thing" will happen.test
instruction is there so that the thread can be interrupted by marking that page unreadable in the event that a GC is necessary.import java.io.*;
public strictfp class fptest {
public static double calc(int k) {
double a = 2.0;
double b = 1.0;
for (int i = 0; i < k; i++) {
b *= a;
}
return b;
}
public static double intest() {
double d = 0;
for (int i = 0; i < 4100; i++) d += calc(i);
return d;
}
public static void main(String[] args) throws Exception {
for (int i = 0; i < 100; i++)
System.out.println(intest());
}
}
hotspot/src/cpu/x86/vm/x86_63.ad
. Relevant snippets:instruct strictfp_mulD_reg(regDPR1 dst, regnotDPR1 src) %{
predicate( UseSSE<=1 && Compile::current()->has_method() && Compile::current()
->method()->is_strict() );
match(Set dst (MulD dst src));
ins_cost(1); // Select this instruction for all strict FP double multiplies
format %{ "FLD StubRoutines::_fpu_subnormal_bias1\n\t"
"DMULp $dst,ST\n\t"
"FLD $src\n\t"
"DMULp $dst,ST\n\t"
"FLD StubRoutines::_fpu_subnormal_bias2\n\t"
"DMULp $dst,ST\n\t" %}
opcode(0xDE, 0x1); /* DE C8+i or DE /1*/
ins_encode( strictfp_bias1(dst),
Push_Reg_D(src),
OpcP, RegOpc(dst),
strictfp_bias2(dst) );
ins_pipe( fpu_reg_reg );
%}
instruct strictfp_divD_reg(regDPR1 dst, regnotDPR1 src) %{
predicate (UseSSE<=1);
match(Set dst (DivD dst src));
predicate( UseSSE<=1 && Compile::current()->has_method() && Compile::current()
->method()->is_strict() );
ins_cost(01);
format %{ "FLD StubRoutines::_fpu_subnormal_bias1\n\t"
"DMULp $dst,ST\n\t"
"FLD $src\n\t"
"FDIVp $dst,ST\n\t"
"FLD StubRoutines::_fpu_subnormal_bias2\n\t"
"DMULp $dst,ST\n\t" %}
opcode(0xDE, 0x7); /* DE F8+i or DE /7*/
ins_encode( strictfp_bias1(dst),
Push_Reg_D(src),
OpcP, RegOpc(dst),
strictfp_bias2(dst) );
ins_pipe( fpu_reg_reg );
%}