Java for-loop优化

本文介绍了Java for-loop优化的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述 29岁程序员，3月因学历无情被辞！我用java for循环做了一些运行时测试，并认识到一个奇怪的行为。对于我的代码，我需要像int，double等基本类型的包装对象来模拟io和输出参数，但那不是重点。只要看我的代码。如何使用字段访问的对象比原始类型快？ for 循环类型： pre $ public static void main（String [] args）{ double max = 1000; for（int j = 1; j double i; max = max * 10; long start = System.nanoTime（）; for（i = 0; i } long end = System.nanoTime（）; long微秒=（结束 - 开始）/ 1000; System.out.println（MicroTime primitive（max：=+ max +）：+ microseconds）; $结果： $ MicroTime基元（max：= 10000.0）：110 MicroTime基元（max：= 100000.0）：1081 MicroTime基元（最大值= 10000.0）： $ for 使用简单类型（包装器对象）循环： public static void main（String [] args）{ HDouble max = new HDouble（）; max.value = 1000; for（int j = 1; j HDouble i = new HDouble（）; max.value = max.value * 10; long start = System.nanoTime（）; for（i.value = 0; i.value< max.value; i.value ++）{} long end = System.nanoTime（）; long微秒=（结束 - 开始）/ 1000; System.out.println（MicroTime wrapper（max：=+ max.value +）：+ microseconds）; $结果： $ MicroTime wrapper（max：= 10000.0）：157 MicroTime wrapper（max：= 100000.0）：1561 MicroTime包装器（最大值= 10000.0）：b $ b 有没有人有一个似是而非的解释？编辑： HDouble class： public class HDouble {公共双重价值; $ b public HDouble（）{} public HDouble（double value）{ this.value = value; } @Override public String toString（）{ return String.valueOf（value）; $ b $ p $ b 我也用代码测试了循环。例如，我计算总和 - >相同的行为（差异不是那么大，但我认为原始算法要快得多？）。首先我想，这个计算需要很长时间，这个字段的访问几乎没有什么区别。包装for循环： for（i.value = 0; i.value< max.value; i.value ++）{ sum.value = sum.value + i.value ; $ / code $ / pre $ b $结果：原始for- （i = 0; i sum = sum + i; $ / code $ / pre $ b $结果：解决方案手工制作的微型基因很容易被人愚弄，你永远不知道它们究竟是什么。这就是为什么有像 JMH 这样的特殊工具。但是让我们来分析一下原始手工基准测试的结果： $ p $ static $ class double double double value; public static void main（String [] args）{ primitive（）; wrapper（）; public static void primitive（）{ long start = System.nanoTime（）; for（double d = 0; d } long end = System.nanoTime（）; System.out.printf（Primitive：％.3f s \\\，（end - start）/ 1e9）; public static void wrapper（）{ HD double d = new HDouble（）; long start = System.nanoTime（）; for（d.value = 0; d.value< 1000000000; d.value ++）{} long end = System.nanoTime（）; System.out.printf（Wrapper：％.3f s \\\，（end - start）/ 1e9）; 结果有点类似于你：原始材料：3.618 s 包装材料：1.380 s 现在重复多次测试： pre $ public static void main（String [] args）{ for（int i = 0; i primitive（）; wrapper（）; $ p 它变得更有趣：原始材料：3.661 s 包装材料：1.382 s 原始材料：3.461 s 包装材料：1.380 s 原始：1.376 s 包装：1.381 s 原始：1.371 s 包装：1.372 s 原始： 1.379 s 包装：1.378 s 看起来这两种方法都得到了最终的优化。运行一次，现在用JIT编译器的活动记录： -XX：-TieredCompilation -XX：CompileOnly = Test -XX：+ PrintCompilation 136 1％Test :: primitive @ 6（53字节） 3725 1％Test :: primitive @ -2（53字节）没有进入原语：3.589 s 3748 2％Test :: wrapper @ 17（73字节） 5122 2％Test :: wrapper @ -2（73字节）未进入包装：1.374 s 5122 3 Test :: primitive（53字节） 5124 4％Test :: primitive @ 6（53字节）原语：3.421 s 8544 5测试::封装（73字节） 8547 6％测试::封装@ 17（73字节）封装：1.378 s 原始数据：1.372 s 封装：1.375 s 原始码：1.378 s 包装：1.373 s 原始码：1.375 s 包装：1.378 s 注意％登录编译日志在第一次迭代。这意味着这些方法是在OSR中编译的（（on-堆栈替换）模式。在第二次迭代期间，这些方法在正常模式下重新编译。从那以后，从第三次迭代开始，在执行速度方面，primitive和wrapper没有区别。实际测量的是OSR存根的性能。它通常与应用程序的真实性能无关，你不应该关心它。但问题仍然存在，为什么OSR存根是一个包装编译好比原始变量？为了找到这个，我们需要下载生成的汇编代码： -XX：CompileOnly = Test -XX：+ UnlockDiagnosticVMOptions -XX：+ PrintAssembly 我将省略所有不相关的代码，只留下编译的循环。原始： 0x00000000023e90d0：vmovsd 0x28（％rsp），％xmm1< - 从堆栈中加载double 0x00000000023e90d6：vaddsd -0x7e rip），％xmm1，％xmm1 0x00000000023e90de：test％eax，-0x21f90e4（％rip） 0x00000000023e90e4：vmovsd％xmm1,0x28（％rsp）< - 存储到堆栈 0x00000000023e90ea：vucomisd 0x28（％rsp），％xmm0< - 与栈值比较 0x00000000023e90f0：ja 0x00000000023e90d0 包装： 0x00000000023ebe90：vaddsd -0x78（％rip），％xmm0，％xmm0 0x00000000023ebe98：vmovsd％xmm0,0x10（％rbx）< - 存储到对象字段 0x00000000023ebe9d：test％eax，-0x21fbea3（％rip） 0x00000000023ebea3：vuc omisd％xmm0，％xmm1< - 比较寄存器 0x00000000023ebea7：ja 0x00000000023ebe90 你可以看到，'原始'的情况下，一些加载和存储到一个堆栈的位置，而'包装'大部分在注册操作。为什么OSR存根指向堆栈是非常容易理解的：在解释模式下，局部变量被存储在栈中，并且OSR存根与该解释的框架兼容。在一个'包装'的情况下，这个值被存储在堆中，而对象的引用已经被缓存在一个寄存器中。 I made some runtime tests with java for loops and recognized a strange behaviour.For my code I need wrapper objects for primitive types like int, double and so on, to simulate io and output parameters, but thats not the point.Just watch my code. How can objects with field access be faster then primitive types?for loop with prtimitive type:public static void main(String[] args) { double max = 1000; for (int j = 1; j < 8; j++) { double i; max = max * 10; long start = System.nanoTime(); for (i = 0; i < max; i++) { } long end = System.nanoTime(); long microseconds = (end - start) / 1000; System.out.println("MicroTime primitive(max: ="+max + "): " + microseconds); }}Result:for loop with simple type (wrapper object):public static void main(String[] args) { HDouble max = new HDouble(); max.value = 1000; for (int j = 1; j < 8; j++) { HDouble i = new HDouble(); max.value = max.value*10; long start = System.nanoTime(); for (i.value = 0; i.value <max.value; i.value++) { } long end = System.nanoTime(); long microseconds = (end - start) / 1000; System.out.println("MicroTime wrapper(max: ="+max.value + "): " + microseconds); }}Result:The more iterations, the faster is the second code. But why? I know that the java-compiler and jvm are optimizing my code, but I never thought that primitive types can be slower, than objects with field access.Does anyone have a plausible explanation for it?Edited:HDouble class:public class HDouble { public double value; public HDouble() { } public HDouble(double value) { this.value = value; } @Override public String toString() { return String.valueOf(value); }}I also tested my loops with code in it. For example I calculate the sum -> same behaviour (the difference is not that big, but I thought the primitive algorithm have to be much faster?). First I thought, that the calculation takes that long, that the field access nearly no difference.Wrapper for-loop:for (i.value = 0; i.value <max.value; i.value++) { sum.value = sum.value + i.value;}Result:Primitive for-loop:for (i = 0; i < max; i++) { sum = sum + i;}Result: 解决方案 It's so easy to get fooled by hand-made microbenchmarks - you never know what they actually measure. That's why there are special tools like JMH. But let's analyze what happens to the primitive hand-made benchmark:static class HDouble { double value;}public static void main(String[] args) { primitive(); wrapper();}public static void primitive() { long start = System.nanoTime(); for (double d = 0; d < 1000000000; d++) { } long end = System.nanoTime(); System.out.printf("Primitive: %.3f s\n", (end - start) / 1e9);}public static void wrapper() { HDouble d = new HDouble(); long start = System.nanoTime(); for (d.value = 0; d.value < 1000000000; d.value++) { } long end = System.nanoTime(); System.out.printf("Wrapper: %.3f s\n", (end - start) / 1e9);}The results are somewhat similar to yours:Primitive: 3.618 sWrapper: 1.380 sNow repeat the test several times:public static void main(String[] args) { for (int i = 0; i < 5; i++) { primitive(); wrapper(); }}It gets more interesting:Primitive: 3.661 sWrapper: 1.382 sPrimitive: 3.461 sWrapper: 1.380 sPrimitive: 1.376 s <-- starting from 3rd iterationWrapper: 1.381 s <-- the timings become equalPrimitive: 1.371 sWrapper: 1.372 sPrimitive: 1.379 sWrapper: 1.378 sLooks like both methods got finally optimized. Run it once again, now with logging JIT compiler activity:-XX:-TieredCompilation -XX:CompileOnly=Test -XX:+PrintCompilation 136 1 % Test::primitive @ 6 (53 bytes) 3725 1 % Test::primitive @ -2 (53 bytes) made not entrantPrimitive: 3.589 s 3748 2 % Test::wrapper @ 17 (73 bytes) 5122 2 % Test::wrapper @ -2 (73 bytes) made not entrantWrapper: 1.374 s 5122 3 Test::primitive (53 bytes) 5124 4 % Test::primitive @ 6 (53 bytes)Primitive: 3.421 s 8544 5 Test::wrapper (73 bytes) 8547 6 % Test::wrapper @ 17 (73 bytes)Wrapper: 1.378 sPrimitive: 1.372 sWrapper: 1.375 sPrimitive: 1.378 sWrapper: 1.373 sPrimitive: 1.375 sWrapper: 1.378 sNote % sign in the compilation log on the first iteration. It means that the methods were compiled in OSR (on-stack replacement) mode. During the second iteration the methods were recompiled in normal mode. Since then, starting from the third iteration, there was no difference between primitive and wrapper in execution speed.What you've actually measured is the performance of OSR stub. It is usually not related to the real performance of an application and you shouldn't care much about it.But the question still remains, why OSR stub for a wrapper is compiled better than for a primitive variable? To find this out we need to get down to generated assembly code:-XX:CompileOnly=Test -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssemblyI'll omit all unrelevant code leaving only the compiled loop.Primitive:0x00000000023e90d0: vmovsd 0x28(%rsp),%xmm1 <-- load double from the stack0x00000000023e90d6: vaddsd -0x7e(%rip),%xmm1,%xmm10x00000000023e90de: test %eax,-0x21f90e4(%rip)0x00000000023e90e4: vmovsd %xmm1,0x28(%rsp) <-- store to the stack0x00000000023e90ea: vucomisd 0x28(%rsp),%xmm0 <-- compare with the stack value0x00000000023e90f0: ja 0x00000000023e90d0Wrapper:0x00000000023ebe90: vaddsd -0x78(%rip),%xmm0,%xmm00x00000000023ebe98: vmovsd %xmm0,0x10(%rbx) <-- store to the object field0x00000000023ebe9d: test %eax,-0x21fbea3(%rip)0x00000000023ebea3: vucomisd %xmm0,%xmm1 <-- compare registers0x00000000023ebea7: ja 0x00000000023ebe90As you can see, the 'primitive' case makes a number of loads and stores to a stack location while 'wrapper' does mostly in-register operations. It is quite understandable why OSR stub refers to stack: in the interpreted mode local variables are stored on the stack, and OSR stub is made compatible with this interpreted frame. In a 'wrapper' case the value is stored on the heap, and the reference to the object is already cached in a register. 这篇关于Java for-loop优化的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！