问题描述
JIT可以在多大程度上用特定于处理器的机器指令替换平台无关的代码?
To what extent can a JIT replace platform independent code with processor-specific machine instructions?
例如,x86指令集包括 BSWAP
指令可反转32位整数的字节顺序。在Java中, Integer.reverseBytes()
方法是使用多个按位掩码和移位实现的,即使在x86本机代码中,也可以使用<$ c $在单个指令中实现c> BSWAP 。 JIT(或与此相关的静态编译器)是否能够自动进行更改,还是由于速度/时间权衡不佳而使其过于复杂或不值得?
For example, the x86 instruction set includes the BSWAP
instruction to reverse a 32-bit integer's byte order. In Java the Integer.reverseBytes()
method is implemented using multiple bitwise masks and shifts, even though in x86 native code it could be implemented in a single instruction using BSWAP
. Are JITs (or static compilers for that matter) able to make the change automatically or is it too complex or not worth it due to a poor speed/time tradeoff?
(我知道在大多数情况下这是一个微优化,但我仍然很感兴趣。)
(I know that this is in most cases a micro-optimisation, but I'm interested none the less.)
推荐答案
为此情况是的,热点服务器编译器可以执行此优化。 reverseBytes()方法在热点中注册为vmIntrinsics。当jit编译器编译这些方法时,它将生成一个特殊的IR节点,而不是编译整个方法。并且此节点将在x86中转换为 bswap。参见src / share / vm / opto / library_call.cpp
For this case, yes, the hotspot server compiler could do this optimization. The reverseBytes() methods are registered as vmIntrinsics in hotspot. When jit compiler compile these methods, it will generate a special IR node, not compile the whole method. And this node will be translated into 'bswap' in x86. see src/share/vm/opto/library_call.cpp
//---------------------------- inline_reverseBytes_int/long/char/short-------------------
// inline Integer.reverseBytes(int)
// inline Long.reverseBytes(long)
// inline Character.reverseBytes(char)
// inline Short.reverseBytes(short)
bool LibraryCallKit::inline_reverseBytes(vmIntrinsics::ID id) {
assert(id == vmIntrinsics::_reverseBytes_i || id == vmIntrinsics::_reverseBytes_l ||
id == vmIntrinsics::_reverseBytes_c || id == vmIntrinsics::_reverseBytes_s,
"not reverse Bytes");
if (id == vmIntrinsics::_reverseBytes_i && !Matcher::has_match_rule(Op_ReverseBytesI)) return false;
if (id == vmIntrinsics::_reverseBytes_l && !Matcher::has_match_rule(Op_ReverseBytesL)) return false;
if (id == vmIntrinsics::_reverseBytes_c && !Matcher::has_match_rule(Op_ReverseBytesUS)) return false;
if (id == vmIntrinsics::_reverseBytes_s && !Matcher::has_match_rule(Op_ReverseBytesS)) return false;
_sp += arg_size(); // restore stack pointer
switch (id) {
case vmIntrinsics::_reverseBytes_i:
push(_gvn.transform(new (C, 2) ReverseBytesINode(0, pop())));
break;
case vmIntrinsics::_reverseBytes_l:
push_pair(_gvn.transform(new (C, 2) ReverseBytesLNode(0,pop_pair())));
break;
case vmIntrinsics::_reverseBytes_c:
push(_gvn.transform(new (C, 2) ReverseBytesUSNode(0, pop())));
break;
case vmIntrinsics::_reverseBytes_s:
push(_gvn.transform(new (C, 2) ReverseBytesSNode(0, pop())));
break;
default:
;
}
return true;
}
和src / cpu / x86 / vm / x86_64.ad
and src/cpu/x86/vm/x86_64.ad
instruct bytes_reverse_int(rRegI dst) %{
match(Set dst (ReverseBytesI dst));
format %{ "bswapl $dst" %}
opcode(0x0F, 0xC8); /*Opcode 0F /C8 */
ins_incode( REX_reg(dst), OpcP, opc2_reg(dst) );
ins_pipe( ialu_reg );
%}
这篇关于JIT可以应用多少指令级优化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!