



我正在玩 jmh 以及关于

IF =取指令,ID =指令解码,EX =执行,MEM =存储器访问,WB =寄存器回写



I'm playing with jmh and in the section about looping they said that

I tried it myself

    public int measurewrong_1() {
        return reps(1);

    public int measurewrong_1000() {
        return reps(1000);

and got the following result:

Benchmark                      Mode  Cnt  Score    Error  Units
MyBenchmark.measurewrong_1     avgt   15  2.425 ±  0.137  ns/op
MyBenchmark.measurewrong_1000  avgt   15  0.036 ±  0.001  ns/op

It indeed shows that the MyBenchmark.measurewrong_1000 is dramatically faster than MyBenchmark.measurewrong_1. But I cannot really understand the optimization JVM does to make this performance improvement.

What do they mean the loop is unrolled/pipelined?


Loop unrolling makes pipelining possible. So the pipeline-able CPU (for example RISC) can execute the unrolled code in parallel.

So if your CPU is able to execute 5 pipelines in parallel, your loop will be unrolled in the way:

// pseudo code
int pipelines = 5;
for(int i = 0; i < length; i += pipelines){
    s += (x + y);
    s += (x + y);
    s += (x + y);
    s += (x + y);
    s += (x + y);

IF = Instruction Fetch, ID = Instruction Decode, EX = Execute, MEM = Memory access, WB = Register write back

From Oracle White paper:

more information about pipelining: Classic RISC pipeline


08-30 04:47