问题描述
我决定使用不同的锁定策略并使用JMH来测量增量.我正在使用JMH来检查吞吐量和平均时间,以及用于检查正确性的简单自定义测试.有六种策略:
I've decided to measure incrementation with different locking strategies and using JMH for this purpose.I'm using JMH for checking throughput and average time as well as simple custom test for checking correctness.There are six strategies:
- 原子数
- ReadWrite锁定计数
- 与volatile同步
- 没有挥发的同步块
- sun.misc.Unsafe.compareAndSwap
- sun.misc.Unsafe.getAndAdd
- 不同步计数
基准代码:
@State(Scope.Benchmark)
@BenchmarkMode({Mode.Throughput, Mode.AverageTime})
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Fork(1)
@Warmup(iterations = 5)
@Measurement(iterations = 5)
public class UnsafeCounter_Benchmark {
public Counter unsync, syncNoV, syncV, lock, atomic, unsafe, unsafeGA;
@Setup(Level.Iteration)
public void prepare() {
unsync = new UnsyncCounter();
syncNoV = new SyncNoVolatileCounter();
syncV = new SyncVolatileCounter();
lock = new LockCounter();
atomic = new AtomicCounter();
unsafe = new UnsafeCASCounter();
unsafeGA = new UnsafeGACounter();
}
@Benchmark
public void unsyncCount() {
unsyncCounter();
}
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void unsyncCounter() {
unsync.increment();
}
@Benchmark
public void syncNoVCount() {
syncNoVCounter();
}
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void syncNoVCounter() {
syncNoV.increment();
}
@Benchmark
public void syncVCount() {
syncVCounter();
}
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void syncVCounter() {
syncV.increment();
}
@Benchmark
public void lockCount() {
lockCounter();
}
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void lockCounter() {
lock.increment();
}
@Benchmark
public void atomicCount() {
atomicCounter();
}
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void atomicCounter() {
atomic.increment();
}
@Benchmark
public void unsafeCount() {
unsafeCounter();
}
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void unsafeCounter() {
unsafe.increment();
}
@Benchmark
public void unsafeGACount() {
unsafeGACounter();
}
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void unsafeGACounter() {
unsafeGA.increment();
}
public static void main(String[] args) throws RunnerException {
Options baseOpts = new OptionsBuilder()
.include(UnsafeCounter_Benchmark.class.getSimpleName())
.threads(100)
.jvmArgs("-ea")
.build();
new Runner(baseOpts).run();
}
}
和替补席的结果:
JDK 8u20
Benchmark Mode Samples Score Error Units
o.k.u.u.UnsafeCounter_Benchmark.atomicCount thrpt 5 42.178 ± 17.643 ops/us
o.k.u.u.UnsafeCounter_Benchmark.lockCount thrpt 5 24.044 ± 2.264 ops/us
o.k.u.u.UnsafeCounter_Benchmark.syncNoVCount thrpt 5 22.849 ± 1.344 ops/us
o.k.u.u.UnsafeCounter_Benchmark.syncVCount thrpt 5 20.235 ± 2.027 ops/us
o.k.u.u.UnsafeCounter_Benchmark.unsafeCount thrpt 5 12.460 ± 1.326 ops/us
o.k.u.u.UnsafeCounter_Benchmark.unsafeGACount thrpt 5 39.106 ± 2.966 ops/us
o.k.u.u.UnsafeCounter_Benchmark.unsyncCount thrpt 5 93.076 ± 9.674 ops/us
o.k.u.u.UnsafeCounter_Benchmark.atomicCount avgt 5 2.604 ± 0.133 us/op
o.k.u.u.UnsafeCounter_Benchmark.lockCount avgt 5 4.161 ± 0.546 us/op
o.k.u.u.UnsafeCounter_Benchmark.syncNoVCount avgt 5 4.440 ± 0.523 us/op
o.k.u.u.UnsafeCounter_Benchmark.syncVCount avgt 5 5.073 ± 0.439 us/op
o.k.u.u.UnsafeCounter_Benchmark.unsafeCount avgt 5 9.088 ± 5.964 us/op
o.k.u.u.UnsafeCounter_Benchmark.unsafeGACount avgt 5 2.611 ± 0.164 us/op
o.k.u.u.UnsafeCounter_Benchmark.unsyncCount avgt 5 1.047 ± 0.050 us/op
我期望的最大测量结果,除了 UnsafeCounter_Benchmark.unsafeCount
,它与 sun.misc.Unsafe.compareAndSwapLong
和 while
循环一起使用.这是最慢的锁定.
The most of measurement as I expect, except UnsafeCounter_Benchmark.unsafeCount
which is used sun.misc.Unsafe.compareAndSwapLong
with while
loop. It the the slowest locking.
public void increment() {
long before = counter;
while (!unsafe.compareAndSwapLong(this, offset, before, before + 1L)) {
before = counter;
}
}
我认为性能低下是由于while循环和JMH引起了更高的争用,但是当我由 Executors
检查了正确性后,我得到了预期的数字:
I suggest that low performance is because of while loop and JMH makes higher contention, but when I've checked correctness by Executors
I get figures as I expect:
Counter result: UnsyncCounter 97538676
Time passed in ms:259
Counter result: AtomicCounter 100000000
Time passed in ms:1805
Counter result: LockCounter 100000000
Time passed in ms:3904
Counter result: SyncNoVolatileCounter 100000000
Time passed in ms:14227
Counter result: SyncVolatileCounter 100000000
Time passed in ms:19224
Counter result: UnsafeCASCounter 100000000
Time passed in ms:8077
Counter result: UnsafeGACounter 100000000
Time passed in ms:2549
正确性测试代码:
public class UnsafeCounter_Test {
static class CounterClient implements Runnable {
private Counter c;
private int num;
public CounterClient(Counter c, int num) {
this.c = c;
this.num = num;
}
@Override
public void run() {
for (int i = 0; i < num; i++) {
c.increment();
}
}
}
public static void makeTest(Counter counter) throws InterruptedException {
int NUM_OF_THREADS = 1000;
int NUM_OF_INCREMENTS = 100000;
ExecutorService service = Executors.newFixedThreadPool(NUM_OF_THREADS);
long before = System.currentTimeMillis();
for (int i = 0; i < NUM_OF_THREADS; i++) {
service.submit(new CounterClient(counter, NUM_OF_INCREMENTS));
}
service.shutdown();
service.awaitTermination(1, TimeUnit.MINUTES);
long after = System.currentTimeMillis();
System.out.println("Counter result: " + counter.getClass().getSimpleName() + " " + counter.getCounter());
System.out.println("Time passed in ms:" + (after - before));
}
public static void main(String[] args) throws InterruptedException {
makeTest(new UnsyncCounter());
makeTest(new AtomicCounter());
makeTest(new LockCounter());
makeTest(new SyncNoVolatileCounter());
makeTest(new SyncVolatileCounter());
makeTest(new UnsafeCASCounter());
makeTest(new UnsafeGACounter());
}
}
我知道这是一个非常糟糕的测试,但是在这种情况下,不安全的CAS比Sync变体快两倍,并且一切都按预期进行.有人可以澄清所描述的行为吗?有关更多信息,请参见GitHub存储库:长凳,不安全的CAS计数器
I know that it is very awful test, but in this case Unsafe CAS two times faster than Sync variants and everything goes as expected.Could somebody clarify described behavior?For more information please see GitHub repo: Bench, Unsafe CAS counter
推荐答案
大声思考:值得注意的是,人们经常完成90%的乏味工作,而将10%(从乐趣开始的地方)留给别人!好吧,我正在享受所有的乐趣!
Thinking out loud: it is remarkable how often people do 90% of the tedious work, and leave the 10% (where the fun begins) for someone else! All right, I'm taking all the fun!
让我首先在i7-4790K(8u40 EA)上重复该实验:
Let me repeat the experiment first on my i7-4790K, 8u40 EA:
Benchmark Mode Samples Score Error Units
UnsafeCounter_Benchmark.atomicCount thrpt 5 47.669 ± 18.440 ops/us
UnsafeCounter_Benchmark.lockCount thrpt 5 14.497 ± 7.815 ops/us
UnsafeCounter_Benchmark.syncNoVCount thrpt 5 11.618 ± 2.130 ops/us
UnsafeCounter_Benchmark.syncVCount thrpt 5 11.337 ± 4.532 ops/us
UnsafeCounter_Benchmark.unsafeCount thrpt 5 7.452 ± 1.042 ops/us
UnsafeCounter_Benchmark.unsafeGACount thrpt 5 43.332 ± 3.435 ops/us
UnsafeCounter_Benchmark.unsyncCount thrpt 5 102.773 ± 11.943 ops/us
确实,关于 unsafeCount
测试似乎有些可疑.确实,您必须先验证所有数据,然后才能对其进行验证.对于nanobenchmark,您必须验证生成的代码,以查看是否实际测量了要测量的东西.在JMH中,使用 -prof perfasm
可以非常迅速地实现.实际上,如果您查看那里最热的 unsafeCount
区域,您会发现一些有趣的事情:
Truly, something seems fishy about unsafeCount
test. Really, you have to presume all data is fishy before you validated it. For nanobenchmarks, you have to validate the generated code to see if you actually measure something you want to measure. In JMH, it is very quickly doable with -prof perfasm
. In fact, if you look at the hottest region of unsafeCount
there, you will notice a few funny things:
0.12% 0.04% 0x00007fb45518e7d1: mov 0x10(%r10),%rax
17.03% 23.44% 0x00007fb45518e7d5: test %eax,0x17318825(%rip)
0.21% 0.07% 0x00007fb45518e7db: mov 0x18(%r10),%r11 ; getfield offset
30.33% 10.77% 0x00007fb45518e7df: mov %rax,%r8
0.00% 0x00007fb45518e7e2: add $0x1,%r8
0.01% 0x00007fb45518e7e6: cmp 0xc(%r10),%r12d ; typecheck
0x00007fb45518e7ea: je 0x00007fb45518e80b ; bail to v-call
0.83% 0.48% 0x00007fb45518e7ec: lock cmpxchg %r8,(%r10,%r11,1)
33.27% 25.52% 0x00007fb45518e7f2: sete %r8b
0.12% 0.01% 0x00007fb45518e7f6: movzbl %r8b,%r8d
0.03% 0.04% 0x00007fb45518e7fa: test %r8d,%r8d
0x00007fb45518e7fd: je 0x00007fb45518e7d1 ; back branch
翻译:a)每次迭代都会重新读取 offset
字段-因为CAS内存效应意味着易失性读取,因此需要悲观地重新读取该字段;b)有趣的是,出于相同的原因, unsafe
字段也被 重新读取-出于相同的原因.
Translation: a) offset
field gets re-read on each iteration -- because CAS memory effects imply volatile read, and therefore the field needs to be pessimistically re-read; b) the hilarious part is that unsafe
field is also being re-read for a typecheck -- for the same reason.
这就是为什么高性能代码应如下所示:
This is why high-performance code should look like this:
--- a/utils bench/src/main/java/org/kirmit/utils/unsafe/concurrency/UnsafeCASCounter.java
+++ b/utils bench/src/main/java/org/kirmit/utils/unsafe/concurrency/UnsafeCASCounter.java
@@ -5,13 +5,13 @@ import sun.misc.Unsafe;
public class UnsafeCASCounter implements Counter {
private volatile long counter = 0;
- private final Unsafe unsafe = UnsafeHelper.unsafe;
- private long offset;
- {
+ private static final Unsafe unsafe = UnsafeHelper.unsafe;
+ private static final long offset;
+ static {
try {
offset = unsafe.objectFieldOffset(UnsafeCASCounter.class.getDeclaredField("counter"));
} catch (NoSuchFieldException e) {
- e.printStackTrace();
+ throw new IllegalStateException("Whoops!");
}
}
如果这样做, unsafeCount
的性能将立即得到提升:
If you do that, the unsafeCount
performance boosts right up:
Benchmark Mode Samples Score Error Units
UnsafeCounter_Benchmark.unsafeCount thrpt 5 9.733 ± 0.673 ops/us
...鉴于误差范围,它现在非常接近同步测试.如果您现在查看 -prof perfasm
,这是一个 unsafeCount
循环:
...which is fairly close to synchronized tests now, given the error bounds. If you look at the -prof perfasm
now, this is an unsafeCount
loop:
0.08% 0.02% 0x00007f7575191900: mov 0x10(%r10),%rax
28.09% 28.64% 0x00007f7575191904: test %eax,0x161286f6(%rip)
0.23% 0.08% 0x00007f757519190a: mov %rax,%r11
0x00007f757519190d: add $0x1,%r11
0x00007f7575191911: lock cmpxchg %r11,0x10(%r10)
47.27% 23.48% 0x00007f7575191917: sete %r8b
0.10% 0x00007f757519191b: movzbl %r8b,%r8d
0.02% 0x00007f757519191f: test %r8d,%r8d
0x00007f7575191922: je 0x00007f7575191900
此循环非常紧密,似乎没有什么可以使它运行得更快.我们花费大部分时间来加载更新的"值并实际对其进行CAS-ing.但是我们竞争很多!为了弄清楚争用是否是主要原因,让我们添加一些退避:
This loop is very tight, and it seems nothing can make it go faster. We spend most of the time loading the "updated" value and actually CAS-ing it. But we contend a lot! To figure out if contention is the leading cause, let's add backoffs:
--- a/utils bench/src/main/java/org/kirmit/utils/unsafe/concurrency/UnsafeCASCounter.java
+++ b/utils bench/src/main/java/org/kirmit/utils/unsafe/concurrency/UnsafeCASCounter.java
@@ -20,6 +21,7 @@ public class UnsafeCASCounter implements Counter {
long before = counter;
while (!unsafe.compareAndSwapLong(this, offset, before, before + 1L)) {
before = counter;
+ Blackhole.consumeCPU(1000);
}
}
...运行中:
Benchmark Mode Samples Score Error Units
UnsafeCounter_Benchmark.unsafeCount thrpt 5 99.869 ± 107.933 ops/us
Voila.我们在循环中进行了 more 的工作,但这使我们免于进行过多竞争.我曾尝试在"Nanotrusting the Nanotime" 中对此进行解释,返回那里,进一步了解基准测试方法,尤其是在衡量重量级运营时.这突显了整个实验的陷阱,而不仅仅是 unsafeCount
.
Voila. We do more work in the loop, but it saves us from contending a lot. I tried to explain this before in "Nanotrusting the Nanotime", it might be good to go back there and read up more on benchmarking methodology, especially when heavy-weight operations are measured. This highlights the pitfall in the entire experiment, not only with unsafeCount
.
针对OP的锻炼和感兴趣的读者:解释为什么 unsafeGACount
和 atomicCount
的执行速度比其他测试快得多.您现在有了工具.
Exercise for the OP and interested readers: explain why unsafeGACount
and atomicCount
perform much faster than other tests. You have the tools now.
P.S.在具有C(C< N)个线程的计算机上运行N个线程是很愚蠢的:您可能认为您对N个线程有争用",但是您只在运行和竞争" C个线程.当人们在4核计算机上执行1000个线程时,这尤其有趣.
P.S. Running N threads on machine having C (C < N) threads is silly: you might think you have "contention" with N threads, but instead you are running and "contending" C threads only. It is especially amusing when people do 1000 threads on 4 core machine...
P.P.S.时间检查:10分钟进行性能分析和其他实验,20分钟将其编写.您浪费了多少时间手动复制结果?;)
P.P.S. Time check: 10 minutes to do the profiling and additional experiments, 20 minutes to write it up. And how much time you wasted replicating the result by hand? ;)
这篇关于通过JMH在sun.misc.Unsafe.compareAndSwap测量中的奇怪行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!