本文介绍了为什么Java开关连续的int似乎运行得更快,添加的情况?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 我正在处理一些需要高度优化的Java代码,因为它将在热函数中运行,这些热函数在我的主程序逻辑中的许多点被调用。此代码的一部分涉及将 double 乘以 10 乘以任意非负数 int exponent s。一个快速的方法(编辑:但不是最快的,见下面的更新2)得到乘积值是 switch 在指数: double multiplyByPowerOfTen(final double d,final int exponent){ switch(exponent){ case 0: return d; case 1: return d * 10; case 2: return d * 100; // ...相同模式情况9: return d * 1000000000; case 10: return d * 10000000000L; //与长文字相同的模式 case 18: return d * 1000000000000000000L; 默认值: throw new ParseException(10的未处理的幂+ power,0); } } 上述注释的省略号表示 case int 常数继续递增1,所以真的有19 case 在上面的代码片段。因为我不确定我是否真的需要在 case 语句 10的所有权力 10 thru 18 ,我使用这个开关语句和开关来比较完成1000万次操作的时间只有 case s 0 thru 9 ( exponent 限制为9或更少,以避免打破折扣开关)。我得到了令人惊讶的(对我,至少!)结果,更长的开关与更多 case 更快。 在一个lark,我尝试添加更多 case s,它只是返回了哑元值,我可以得到开关运行更快,大约22-27宣布 case s(即使这些虚假的案例从来没有真正命中的代码运行时)。 (再次, case 常量以增加前一个 case 1 。)这些执行时间差异不是很显着:对于 0 之间的随机 10 ,虚拟填充开关语句在1.49秒内完成1000万次执行,而在未打开版本中为1.54秒,每次执行总共节省5ns。所以,不是那种恳切的东西,从优化的角度来看,填补一个开关语句值得的努力。但我仍然只是发现它好奇和反直觉,开关不会变得更慢(或者最好保持恒定 O(1)时间)执行,因为更多 case 被添加到它。 这些是我对随机生成的指数值运行各种限制获得的结果。对于指数限制,我没有将结果一直包含到 1 ,但是一般形状曲线保持相同,具有在12-17情况标记周围的脊,以及在18-28之间的谷。所有测试都在JUnitBenchmarks中使用共享容器来运行随机值,以确保相同的测试输入。我也按照从最长的 switch 语句到最短的顺序运行测试,反之亦然,以试图消除与顺序相关的测试问题的可能性。我已经把我的测试代码放在一个github repo如果任何人想尝试重现这些结果。 那么,这里发生了什么?我的建筑或微基准结构的一些变幻莫测?或者是 18 到 28 , code> 范围比 11 到 17 ? github test repo切换实验 UPDATE:我已经清理了基准化库,并在/ results中添加了一个文本文件一些输出跨越较宽范围的可能的指数值。我还在测试代码中添加了一个选项,不会从默认中抛出异常,但这不会出现影响结果。 UPDATE 2:在2009年的xkcd论坛上发现了一些相当不错的讨论这个问题: http://forums.xkcd.com/viewtopic.php?f=11&t = 33524 。 OP的讨论使用 Array.binarySearch()给了我一个简单的基于数组的实现上面的指数模式的想法。没有必要进行二分搜索,因为我知道数组中的条目。它似乎比使用 switch 运行快3倍,显然是以一些控制流为代价的, switch 提供。该代码已经添加到github库。解决方案正如指出由另一个答案,因为case值是连续的(相对于稀疏),生成的字节码为您的各种测试使用一个切换表(字节码指令 tableswitch / code>指令并不总是导致一个指针数组:有时切换表被转换为看起来像 lookupswitch (类似于 if / else if 结构)。 JIT(热点JDK 1.7)显示它使用一连串的if / else如果当有17个或更少的情况下,当有超过18(更高效)的指针数组。 使用18的幻数的原因似乎是 MinJumpTableSize JVM标志(代码中的第352行)。 我提出了热点编译器列表和它似乎是过去测试的遗产。请注意,此默认值已在JDK 8中删除 执行更多基准化。 最后,当方法变得太长(在我的测试中大于25个情况)时,它不再使用默认的JVM设置 - 这是当时性能下降的最可能的原因。 有5个例子,反编译代码看起来像这样(注意cmp / je / jg / jmp指令,if / goto ): [已验证的入口点] #{method}'multiplyByPowerOfTen' javaapplication4 / Test1'#parm0:xmm0:xmm0 = double #parm1:rdx = int #[sp + 0x20](sp调用者) 0x00000000024f0160:mov DWORD PTR [rsp-0x6000],eax ; {no_reloc} 0x00000000024f0167:push rbp 0x00000000024f0168:sub rsp,0x10; *同步条目; - javaapplication4.Test1 :: multiplyByPowerOfTen @ -1(line 56) 0x00000000024f016c:cmp edx,0x3 0x00000000024f016f:je 0x00000000024f016f3 0x00000000024f0171:cmp edx,0x3 0x00000000024f0174:jg 0x00000000024f01a5 0x00000000024f0176:cmp edx,0x1 0x00000000024f0179:je 0x00000000024f017b 0x00000000024f017b:cmp edx,0x1 0x00000000024f017e:jg 0x00000000024f0191 0x00000000024f0180:test edx,edx 0x00000000024f0182:je 0x00000000024f01cb 0x00000000024f0184:mov ebp,edx 0x00000000024f0186:mov edx,0x17 0x00000000024f018b:call 0x00000000024c90a0; OopMap {off = 48} ; * new; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 72(line 83); {runtime_call} 0x00000000024f0190:int3; * new; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 72(line 83) 0x00000000024f0191:mulsd xmm0,QWORD PTR [rip + 0xffffffffffffffa7]#0x00000000024f0140 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 52(line 62); {section_word} 0x00000000024f0199:jmp 0x00000000024f01cb 0x00000000024f019b:mulsd xmm0,QWORD PTR [rip + 0xffffffffffffff8d]#0x00000000024f0130 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 46(line 60); {section_word} 0x00000000024f01a3:jmp 0x00000000024f01cb 0x00000000024f01a5:cmp edx,0x5 0x00000000024f01a8:je 0x00000000024f01b9 0x00000000024f01aa:cmp edx,0x5 0x00000000024f01ad:jg 0x00000000024f0184; ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 1(line 56) 0x00000000024f01af:mulsd xmm0,QWORD PTR [rip + 0xffffffffffffff81]#0x00000000024f0138 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 64(line 66); {section_word} 0x00000000024f01b7:jmp 0x00000000024f01cb 0x00000000024f01b9:mulsd xmm0,QWORD PTR [rip + 0xffffffffffffff67]#0x00000000024f0128 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 70(line 68); {section_word} 0x00000000024f01c1:jmp 0x00000000024f01cb 0x00000000024f01c3:mulsd xmm0,QWORD PTR [rip + 0xffffffffffffff55]#0x00000000024f0120 ; * tableswitch ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 1(line 56); {section_word} 0x00000000024f01cb:add rsp,0x10 0x00000000024f01cf:pop rbp 0x00000000024f01d0:test DWORD PTR [rip + 0xfffffffffdf3fe2a],eax#0x0000000000430000 ; {poll_return} 0x00000000024f01d6:ret 在18个案例中,程序集看起来像这样使用的指针数组,并且抑制所有比较的需要: jmp QWORD PTR [r8 + r10 * 1] 直接跳到右边的乘法) - 这是可能是性能改善的原因: [已验证的入口点] #{method}multiplyByPowerOfTen )D'in'javaapplication4 / Test1'#parm0:xmm0:xmm0 = double #parm1:rdx = int #[sp + 0x20](sp调用者) 0x000000000287fe20:mov DWORD PTR [rsp-0x6000],eax ; {no_reloc} 0x000000000287fe27:push rbp 0x000000000287fe28:sub rsp,0x10; *同步条目; - javaapplication4.Test1 :: multiplyByPowerOfTen @ -1(line 56) 0x000000000287fe2c:cmp edx,0x13 0x000000000287fe2f:jae 0x000000000287fe46 0x000000000287fe31:movsxd r10,edx 0x000000000287fe34:shl r10 ,0x3 0x000000000287fe38:movabs r8,0x287fd70; {section_word} 0x000000000287fe42:jmp QWORD PTR [r8 + r10 * 1]; * tableswitch ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 1(line 56) 0x000000000287fe46:mov ebp,edx 0x000000000287fe48:mov edx,0x31 0x000000000287fe4d:xchg ax,ax 0x000000000287fe4f:call 0x00000000028590a0; OopMap {off = 52} ; * new; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 202(line 96); {runtime_call} 0x000000000287fe54:int3; * new; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 202(line 96) 0x000000000287fe55:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe8b]#0x000000000287fce8 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 194(line 92); {section_word} 0x000000000287fe5d:jmp 0x000000000287ff16 0x000000000287fe62:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe86]#0x000000000287fcf0 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 188(line 90); {section_word} 0x000000000287fe6a:jmp 0x000000000287ff16 0x000000000287fe6f:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe81]#0x000000000287fcf8 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 182(line 88); {section_word} 0x000000000287fe77:jmp 0x000000000287ff16 0x000000000287fe7c:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe7c]#0x000000000287fd00 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 176(line 86); {section_word} 0x000000000287fe84:jmp 0x000000000287ff16 0x000000000287fe89:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe77]#0x000000000287fd08 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 170(line 84); {section_word} 0x000000000287fe91:jmp 0x000000000287ff16 0x000000000287fe96:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe72]#0x000000000287fd10 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 164(line 82); {section_word} 0x000000000287fe9e:jmp 0x000000000287ff16 0x000000000287fea0:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe70]#0x000000000287fd18 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 158(line 80); {section_word} 0x000000000287fea8:jmp 0x000000000287ff16 0x000000000287feaa:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe6e]#0x000000000287fd20 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 152(line 78); {section_word} 0x000000000287feb2:jmp 0x000000000287ff16 0x000000000287feb4:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe24]#0x000000000287fce0 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 146(line 76); {section_word} 0x000000000287febc:jmp 0x000000000287ff16 0x000000000287febe:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe6a]#0x000000000287fd30 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 140(line 74); {section_word} 0x000000000287fec6:jmp 0x000000000287ff16 0x000000000287fec8:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe68]#0x000000000287fd38 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 134(line 72); {section_word} 0x000000000287fed0:jmp 0x000000000287ff16 0x000000000287fed2:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe66]#0x000000000287fd40 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 128(line 70); {section_word} 0x000000000287feda:jmp 0x000000000287ff16 0x000000000287fedc:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe64]#0x000000000287fd48 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 122(line 68); {section_word} 0x000000000287fee4:jmp 0x000000000287ff16 0x000000000287fee6:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe62]#0x000000000287fd50 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 116(line 66); {section_word} 0x000000000287feee:jmp 0x000000000287ff16 0x000000000287fef0:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe60]#0x000000000287fd58 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 110(line 64); {section_word} 0x000000000287fef8:jmp 0x000000000287ff16 0x000000000287fefa:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe5e]#0x000000000287fd60 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 104(line 62); {section_word} 0x000000000287ff02:jmp 0x000000000287ff16 0x000000000287ff04:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe5c]#0x000000000287fd68 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 98(line 60); {section_word} 0x000000000287ff0c:jmp 0x000000000287ff16 0x000000000287ff0e:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe12]#0x000000000287fd28 ; * tableswitch ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 1(line 56); {section_word} 0x000000000287ff16:add rsp,0x10 0x000000000287ff1a:pop rbp 0x000000000287ff1b:test DWORD PTR [rip + 0xffffffffd9b00df],eax#0x0000000000230000 ; {poll_return} 0x000000000287ff21:ret 类似于18个情况,除了出现在代码中间的附加 movapd xmm0,xmm1 ,但是,性能下降的最可能的原因是该方法太长,无法与默认的JVM设置内联: [javaapplication4 / Test1]中的#{method}'multiplyByPowerOfTen' #parm0:xmm0:xmm0 = double #parm1:rdx = int #[sp + 0x20](sp调用者) 0x0000000002524560:mov DWORD PTR [rsp-0x6000],eax ; {no_reloc} 0x0000000002524567:push rbp 0x0000000002524568:sub rsp,0x10; *同步条目; - javaapplication4.Test1 :: multiplyByPowerOfTen @ -1(line 56) 0x000000000252456c:movapd xmm1,xmm0 0x0000000002524570:cmp edx,0x1f 0x0000000002524573:jae 0x0000000002524592; * tableswitch ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 1(line 56) 0x0000000002524575:movsxd r10,edx 0x0000000002524578:shl r10,0x3 0x000000000252457c:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffe3c] 0x00000000025243c0 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 364(line 118); {section_word} 0x0000000002524584:movabs r8,0x2524450; {section_word} 0x000000000252458e:jmp QWORD PTR [r8 + r10 * 1]; * tableswitch ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 1(line 56) 0x0000000002524592:mov ebp,edx 0x0000000002524594:mov edx,0x31 0x0000000002524599:xchg ax,ax 0x000000000252459b:call 0x00000000024f90a0; oopMap {off = 64} ; * new; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 370(line 120); {runtime_call} 0x00000000025245a0:int3; * new; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 370(line 120) 0x00000000025245a1:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe27]#0x00000000025243d0 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 358(line 116); {section_word} 0x00000000025245a9:jmp 0x0000000002524744 0x00000000025245ae:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe22]#0x00000000025243d8 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 348(line 114); {section_word} 0x00000000025245b6:jmp 0x0000000002524744 0x00000000025245bb:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe1d]#0x00000000025243e0 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 338(line 112); {section_word} 0x00000000025245c3:jmp 0x0000000002524744 0x00000000025245c8:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe18]#0x00000000025243e8 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 328(line 110); {section_word} 0x00000000025245d0:jmp 0x0000000002524744 0x00000000025245d5:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe13]#0x00000000025243f0 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 318(line 108); {section_word} 0x00000000025245dd:jmp 0x0000000002524744 0x00000000025245e2:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe0e]#0x00000000025243f8 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 308(line 106); {section_word} 0x00000000025245ea:jmp 0x0000000002524744 0x00000000025245ef:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe09]#0x0000000002524400 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 298(line 104); {section_word} 0x00000000025245f7:jmp 0x0000000002524744 0x00000000025245fc:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe04]#0x0000000002524408 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 288(line 102); {section_word} 0x0000000002524604:jmp 0x0000000002524744 0x0000000002524609:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffdff]#0x0000000002524410 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 278(line 100); {section_word} 0x0000000002524611:jmp 0x0000000002524744 0x0000000002524616:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffdfa]#0x0000000002524418 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 268(line 98); {section_word} 0x000000000252461e:jmp 0x0000000002524744 0x0000000002524623:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffd9d]#0x00000000025243c8 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 258(line 96); {section_word} 0x000000000252462b:jmp 0x0000000002524744 0x0000000002524630:movapd xmm0,xmm1 0x0000000002524634:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffffe0c]#0x0000000002524448 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 242(line 92); {section_word} 0x000000000252463c:jmp 0x0000000002524744 0x0000000002524641:movapd xmm0,xmm1 0x0000000002524645:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffddb]#0x0000000002524428 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 236(line 90); {section_word} 0x000000000252464d:jmp 0x0000000002524744 0x0000000002524652:movapd xmm0,xmm1 0x0000000002524656:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffdd2]#0x0000000002524430 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 230(line 88); {section_word} 0x000000000252465e:jmp 0x0000000002524744 0x0000000002524663:movapd xmm0,xmm1 0x0000000002524667:mulsd xmm0,QWORD PTR [rip + 0xfffffffffffdc9]#0x0000000002524438 ; * dmul ; - javaapplication4.Test1 :: multiplyByPowerOfTen @ 224(line 86); {section_word} [etc.] 0x0000000002524744:add rsp,0x10 0x0000000002524748:pop rbp 0x0000000002524749:test DWORD PTR [rip + 0xfffffffffde1b8b1 ],eax#0x0000000000340000 ; {poll_return} 0x000000000252474f:ret I am working on some Java code which needs to be highly optimized as it will run in hot functions that are invoked at many points in my main program logic. Part of this code involves multiplying double variables by 10 raised to arbitrary non-negative int exponents. One fast way (edit: but not the fastest possible, see Update 2 below) to get the multiplied value is to switch on the exponent:double multiplyByPowerOfTen(final double d, final int exponent) { switch (exponent) { case 0: return d; case 1: return d*10; case 2: return d*100; // ... same pattern case 9: return d*1000000000; case 10: return d*10000000000L; // ... same pattern with long literals case 18: return d*1000000000000000000L; default: throw new ParseException("Unhandled power of ten " + power, 0); }}The commented ellipses above indicate that the case int constants continue incrementing by 1, so there are really 19 cases in the above code snippet. Since I wasn't sure whether I would actually need all the powers of 10 in case statements 10 thru 18, I ran some microbenchmarks comparing the time to complete 10 million operations with this switch statement versus a switch with only cases 0 thru 9 (with the exponent limited to 9 or less to avoid breaking the pared-down switch). I got the rather surprising (to me, at least!) result that the longer switch with more case statements actually ran faster.On a lark, I tried adding even more cases which just returned dummy values, and found that I could get the switch to run even faster with around 22-27 declared cases (even though those dummy cases are never actually hit while the code is running). (Again, cases were added in a contiguous fashion by incrementing the prior case constant by 1.) These execution time differences are not very significant: for a random exponent between 0 and 10, the dummy padded switch statement finishes 10 million executions in 1.49 secs versus 1.54 secs for the unpadded version, for a grand total savings of 5ns per execution. So, not the kind of thing that makes obsessing over padding out a switch statement worth the effort from an optimization standpoint. But I still just find it curious and counter-intuitive that a switch doesn't become slower (or perhaps at best maintain constant O(1) time) to execute as more cases are added to it. These are the results I obtained from running with various limits on the randomly-generated exponent values. I didn't include the results all the way down to 1 for the exponent limit, but the general shape of the curve remains the same, with a ridge around the 12-17 case mark, and a valley between 18-28. All tests were run in JUnitBenchmarks using shared containers for the random values to ensure identical testing inputs. I also ran the tests both in order from longest switch statement to shortest, and vice-versa, to try and eliminate the possibility of ordering-related test problems. I've put my testing code up on a github repo if anyone wants to try to reproduce these results.So, what's going on here? Some vagaries of my architecture or micro-benchmark construction? Or is the Java switch really a little faster to execute in the 18 to 28 case range than it is from 11 up to 17?github test repo "switch-experiment"UPDATE: I cleaned up the benchmarking library quite a bit and added a text file in /results with some output across a wider range of possible exponent values. I also added an option in the testing code not to throw an Exception from default, but this doesn't appear to affect the results.UPDATE 2: Found some pretty good discussion of this issue from back in 2009 on the xkcd forum here: http://forums.xkcd.com/viewtopic.php?f=11&t=33524. The OP's discussion of using Array.binarySearch() gave me the idea for a simple array-based implementation of the exponentiation pattern above. There's no need for the binary search since I know what the entries in the array are. It appears to run about 3 times faster than using switch, obviously at the expense of some of the control flow that switch affords. That code has been added to the github repo also. 解决方案 As pointed out by the other answer, because the case values are contiguous (as opposed to sparse), the generated bytecode for your various tests uses a switch table (bytecode instruction tableswitch).However, once the JIT starts its job and compiles the bytecode into assembly, the tableswitch instruction does not always result in an array of pointers: sometimes the switch table is transformed into what looks like a lookupswitch (similar to an if/else if structure).Decompiling the assembly generated by the JIT (hotspot JDK 1.7) shows that it uses a succession of if/else if when there are 17 cases or less, an array of pointers when there are more than 18 (more efficient).The reason why this magic number of 18 is used seems to come down to the default value of the MinJumpTableSize JVM flag (around line 352 in the code).I have raised the issue on the hotspot compiler list and it seems to be a legacy of past testing. Note that this default value has been removed in JDK 8 after more benchmarking was performed.Finally, when the method becomes too long (> 25 cases in my tests), it is in not inlined any longer with the default JVM settings - that is the likeliest cause for the drop in performance at that point.With 5 cases, the decompiled code looks like this (notice the cmp/je/jg/jmp instructions, the assembly for if/goto):[Verified Entry Point] # {method} 'multiplyByPowerOfTen' '(DI)D' in 'javaapplication4/Test1' # parm0: xmm0:xmm0 = double # parm1: rdx = int # [sp+0x20] (sp of caller) 0x00000000024f0160: mov DWORD PTR [rsp-0x6000],eax ; {no_reloc} 0x00000000024f0167: push rbp 0x00000000024f0168: sub rsp,0x10 ;*synchronization entry ; - javaapplication4.Test1::multiplyByPowerOfTen@-1 (line 56) 0x00000000024f016c: cmp edx,0x3 0x00000000024f016f: je 0x00000000024f01c3 0x00000000024f0171: cmp edx,0x3 0x00000000024f0174: jg 0x00000000024f01a5 0x00000000024f0176: cmp edx,0x1 0x00000000024f0179: je 0x00000000024f019b 0x00000000024f017b: cmp edx,0x1 0x00000000024f017e: jg 0x00000000024f0191 0x00000000024f0180: test edx,edx 0x00000000024f0182: je 0x00000000024f01cb 0x00000000024f0184: mov ebp,edx 0x00000000024f0186: mov edx,0x17 0x00000000024f018b: call 0x00000000024c90a0 ; OopMap{off=48} ;*new ; - javaapplication4.Test1::multiplyByPowerOfTen@72 (line 83) ; {runtime_call} 0x00000000024f0190: int3 ;*new ; - javaapplication4.Test1::multiplyByPowerOfTen@72 (line 83) 0x00000000024f0191: mulsd xmm0,QWORD PTR [rip+0xffffffffffffffa7] # 0x00000000024f0140 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@52 (line 62) ; {section_word} 0x00000000024f0199: jmp 0x00000000024f01cb 0x00000000024f019b: mulsd xmm0,QWORD PTR [rip+0xffffffffffffff8d] # 0x00000000024f0130 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@46 (line 60) ; {section_word} 0x00000000024f01a3: jmp 0x00000000024f01cb 0x00000000024f01a5: cmp edx,0x5 0x00000000024f01a8: je 0x00000000024f01b9 0x00000000024f01aa: cmp edx,0x5 0x00000000024f01ad: jg 0x00000000024f0184 ;*tableswitch ; - javaapplication4.Test1::multiplyByPowerOfTen@1 (line 56) 0x00000000024f01af: mulsd xmm0,QWORD PTR [rip+0xffffffffffffff81] # 0x00000000024f0138 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@64 (line 66) ; {section_word} 0x00000000024f01b7: jmp 0x00000000024f01cb 0x00000000024f01b9: mulsd xmm0,QWORD PTR [rip+0xffffffffffffff67] # 0x00000000024f0128 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@70 (line 68) ; {section_word} 0x00000000024f01c1: jmp 0x00000000024f01cb 0x00000000024f01c3: mulsd xmm0,QWORD PTR [rip+0xffffffffffffff55] # 0x00000000024f0120 ;*tableswitch ; - javaapplication4.Test1::multiplyByPowerOfTen@1 (line 56) ; {section_word} 0x00000000024f01cb: add rsp,0x10 0x00000000024f01cf: pop rbp 0x00000000024f01d0: test DWORD PTR [rip+0xfffffffffdf3fe2a],eax # 0x0000000000430000 ; {poll_return} 0x00000000024f01d6: ret With 18 cases, the assembly looks like this (notice the array of pointers which is used and suppresses the need for all the comparisons: jmp QWORD PTR [r8+r10*1] jumps directly to the right multiplication) - that is the likely reason for the performance improvement:[Verified Entry Point] # {method} 'multiplyByPowerOfTen' '(DI)D' in 'javaapplication4/Test1' # parm0: xmm0:xmm0 = double # parm1: rdx = int # [sp+0x20] (sp of caller) 0x000000000287fe20: mov DWORD PTR [rsp-0x6000],eax ; {no_reloc} 0x000000000287fe27: push rbp 0x000000000287fe28: sub rsp,0x10 ;*synchronization entry ; - javaapplication4.Test1::multiplyByPowerOfTen@-1 (line 56) 0x000000000287fe2c: cmp edx,0x13 0x000000000287fe2f: jae 0x000000000287fe46 0x000000000287fe31: movsxd r10,edx 0x000000000287fe34: shl r10,0x3 0x000000000287fe38: movabs r8,0x287fd70 ; {section_word} 0x000000000287fe42: jmp QWORD PTR [r8+r10*1] ;*tableswitch ; - javaapplication4.Test1::multiplyByPowerOfTen@1 (line 56) 0x000000000287fe46: mov ebp,edx 0x000000000287fe48: mov edx,0x31 0x000000000287fe4d: xchg ax,ax 0x000000000287fe4f: call 0x00000000028590a0 ; OopMap{off=52} ;*new ; - javaapplication4.Test1::multiplyByPowerOfTen@202 (line 96) ; {runtime_call} 0x000000000287fe54: int3 ;*new ; - javaapplication4.Test1::multiplyByPowerOfTen@202 (line 96) 0x000000000287fe55: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe8b] # 0x000000000287fce8 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@194 (line 92) ; {section_word} 0x000000000287fe5d: jmp 0x000000000287ff16 0x000000000287fe62: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe86] # 0x000000000287fcf0 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@188 (line 90) ; {section_word} 0x000000000287fe6a: jmp 0x000000000287ff16 0x000000000287fe6f: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe81] # 0x000000000287fcf8 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@182 (line 88) ; {section_word} 0x000000000287fe77: jmp 0x000000000287ff16 0x000000000287fe7c: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe7c] # 0x000000000287fd00 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@176 (line 86) ; {section_word} 0x000000000287fe84: jmp 0x000000000287ff16 0x000000000287fe89: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe77] # 0x000000000287fd08 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@170 (line 84) ; {section_word} 0x000000000287fe91: jmp 0x000000000287ff16 0x000000000287fe96: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe72] # 0x000000000287fd10 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@164 (line 82) ; {section_word} 0x000000000287fe9e: jmp 0x000000000287ff16 0x000000000287fea0: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe70] # 0x000000000287fd18 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@158 (line 80) ; {section_word} 0x000000000287fea8: jmp 0x000000000287ff16 0x000000000287feaa: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe6e] # 0x000000000287fd20 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@152 (line 78) ; {section_word} 0x000000000287feb2: jmp 0x000000000287ff16 0x000000000287feb4: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe24] # 0x000000000287fce0 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@146 (line 76) ; {section_word} 0x000000000287febc: jmp 0x000000000287ff16 0x000000000287febe: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe6a] # 0x000000000287fd30 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@140 (line 74) ; {section_word} 0x000000000287fec6: jmp 0x000000000287ff16 0x000000000287fec8: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe68] # 0x000000000287fd38 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@134 (line 72) ; {section_word} 0x000000000287fed0: jmp 0x000000000287ff16 0x000000000287fed2: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe66] # 0x000000000287fd40 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@128 (line 70) ; {section_word} 0x000000000287feda: jmp 0x000000000287ff16 0x000000000287fedc: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe64] # 0x000000000287fd48 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@122 (line 68) ; {section_word} 0x000000000287fee4: jmp 0x000000000287ff16 0x000000000287fee6: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe62] # 0x000000000287fd50 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@116 (line 66) ; {section_word} 0x000000000287feee: jmp 0x000000000287ff16 0x000000000287fef0: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe60] # 0x000000000287fd58 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@110 (line 64) ; {section_word} 0x000000000287fef8: jmp 0x000000000287ff16 0x000000000287fefa: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe5e] # 0x000000000287fd60 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@104 (line 62) ; {section_word} 0x000000000287ff02: jmp 0x000000000287ff16 0x000000000287ff04: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe5c] # 0x000000000287fd68 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@98 (line 60) ; {section_word} 0x000000000287ff0c: jmp 0x000000000287ff16 0x000000000287ff0e: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe12] # 0x000000000287fd28 ;*tableswitch ; - javaapplication4.Test1::multiplyByPowerOfTen@1 (line 56) ; {section_word} 0x000000000287ff16: add rsp,0x10 0x000000000287ff1a: pop rbp 0x000000000287ff1b: test DWORD PTR [rip+0xfffffffffd9b00df],eax # 0x0000000000230000 ; {poll_return} 0x000000000287ff21: ret And finally the assembly with 30 cases (below) looks similar to 18 cases, except for the additional movapd xmm0,xmm1 that appears towards the middle of the code, as spotted by @cHao - however the likeliest reason for the drop in performance is that the method is too long to be inlined with the default JVM settings:[Verified Entry Point] # {method} 'multiplyByPowerOfTen' '(DI)D' in 'javaapplication4/Test1' # parm0: xmm0:xmm0 = double # parm1: rdx = int # [sp+0x20] (sp of caller) 0x0000000002524560: mov DWORD PTR [rsp-0x6000],eax ; {no_reloc} 0x0000000002524567: push rbp 0x0000000002524568: sub rsp,0x10 ;*synchronization entry ; - javaapplication4.Test1::multiplyByPowerOfTen@-1 (line 56) 0x000000000252456c: movapd xmm1,xmm0 0x0000000002524570: cmp edx,0x1f 0x0000000002524573: jae 0x0000000002524592 ;*tableswitch ; - javaapplication4.Test1::multiplyByPowerOfTen@1 (line 56) 0x0000000002524575: movsxd r10,edx 0x0000000002524578: shl r10,0x3 0x000000000252457c: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe3c] # 0x00000000025243c0 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@364 (line 118) ; {section_word} 0x0000000002524584: movabs r8,0x2524450 ; {section_word} 0x000000000252458e: jmp QWORD PTR [r8+r10*1] ;*tableswitch ; - javaapplication4.Test1::multiplyByPowerOfTen@1 (line 56) 0x0000000002524592: mov ebp,edx 0x0000000002524594: mov edx,0x31 0x0000000002524599: xchg ax,ax 0x000000000252459b: call 0x00000000024f90a0 ; OopMap{off=64} ;*new ; - javaapplication4.Test1::multiplyByPowerOfTen@370 (line 120) ; {runtime_call} 0x00000000025245a0: int3 ;*new ; - javaapplication4.Test1::multiplyByPowerOfTen@370 (line 120) 0x00000000025245a1: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe27] # 0x00000000025243d0 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@358 (line 116) ; {section_word} 0x00000000025245a9: jmp 0x0000000002524744 0x00000000025245ae: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe22] # 0x00000000025243d8 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@348 (line 114) ; {section_word} 0x00000000025245b6: jmp 0x0000000002524744 0x00000000025245bb: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe1d] # 0x00000000025243e0 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@338 (line 112) ; {section_word} 0x00000000025245c3: jmp 0x0000000002524744 0x00000000025245c8: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe18] # 0x00000000025243e8 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@328 (line 110) ; {section_word} 0x00000000025245d0: jmp 0x0000000002524744 0x00000000025245d5: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe13] # 0x00000000025243f0 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@318 (line 108) ; {section_word} 0x00000000025245dd: jmp 0x0000000002524744 0x00000000025245e2: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe0e] # 0x00000000025243f8 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@308 (line 106) ; {section_word} 0x00000000025245ea: jmp 0x0000000002524744 0x00000000025245ef: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe09] # 0x0000000002524400 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@298 (line 104) ; {section_word} 0x00000000025245f7: jmp 0x0000000002524744 0x00000000025245fc: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe04] # 0x0000000002524408 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@288 (line 102) ; {section_word} 0x0000000002524604: jmp 0x0000000002524744 0x0000000002524609: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffdff] # 0x0000000002524410 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@278 (line 100) ; {section_word} 0x0000000002524611: jmp 0x0000000002524744 0x0000000002524616: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffdfa] # 0x0000000002524418 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@268 (line 98) ; {section_word} 0x000000000252461e: jmp 0x0000000002524744 0x0000000002524623: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffd9d] # 0x00000000025243c8 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@258 (line 96) ; {section_word} 0x000000000252462b: jmp 0x0000000002524744 0x0000000002524630: movapd xmm0,xmm1 0x0000000002524634: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffe0c] # 0x0000000002524448 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@242 (line 92) ; {section_word} 0x000000000252463c: jmp 0x0000000002524744 0x0000000002524641: movapd xmm0,xmm1 0x0000000002524645: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffddb] # 0x0000000002524428 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@236 (line 90) ; {section_word} 0x000000000252464d: jmp 0x0000000002524744 0x0000000002524652: movapd xmm0,xmm1 0x0000000002524656: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffdd2] # 0x0000000002524430 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@230 (line 88) ; {section_word} 0x000000000252465e: jmp 0x0000000002524744 0x0000000002524663: movapd xmm0,xmm1 0x0000000002524667: mulsd xmm0,QWORD PTR [rip+0xfffffffffffffdc9] # 0x0000000002524438 ;*dmul ; - javaapplication4.Test1::multiplyByPowerOfTen@224 (line 86) ; {section_word}[etc.] 0x0000000002524744: add rsp,0x10 0x0000000002524748: pop rbp 0x0000000002524749: test DWORD PTR [rip+0xfffffffffde1b8b1],eax # 0x0000000000340000 ; {poll_return} 0x000000000252474f: ret 这篇关于为什么Java开关连续的int似乎运行得更快,添加的情况?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 10-18 14:30