Closed. This question is off-topic。它当前不接受答案。
想改善这个问题吗? Update the question,所以它是用于堆栈溢出的on-topic。
7年前关闭。
Improve this question
我们刚刚交付了功能强大的128Gb 32核AMD Opteron服务器。我们有2个6272 CPU,每个CPU具有16个内核。我们正在30个线程上运行一个长时间运行的大型Java任务。我们已针对Linux和Java启用了NUMA优化。我们的Java线程主要使用对该线程专用的对象,有时读取其他线程将要读取的内存,并且非常非常偶尔地编写或锁定共享对象。
我们无法解释为什么CPU内核空闲25%。以下是“顶部”的转储:
顶部-23:06:38最多1天,23分钟,3个用户,平均负载:10.84、10.27、9.62
任务:总共676次,正在跑步1次,正在睡觉675次,已停止0次,丧尸0次
Cpu(s):64.5%us,1.3%sy,0.0%ni,32.9%id,1.3%wa,0.0%hi,0.0%si,0.0%st
内存:总计132138168k,已使用131652664k,免费485504k,缓冲92340k
掉期:总5701624k,已使用230252k,免费5471372k,已缓存13444344k
...
顶部-22:37:39最多23:54,3个用户,平均负载:7.83、8.70、9.27
任务:总计678,正在运行1,正在睡眠677,已停止0,僵尸0
Cpu0:75.8%us,2.0%sy,0.0%ni,22.2%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu1:77.2%us,1.3%sy,0.0%ni,21.5%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu2:77.3%us,1.0%sy,0.0%ni,21.7%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu3:77.8%us,1.0%sy,0.0%ni,21.2%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu4:76.9%us,2.0%sy,0.0%ni,21.1%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu5:76.3%us,2.0%sy,0.0%ni,21.7%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu6:12.6%us,3.0%sy,0.0%ni,84.4%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu7:8.6%us,2.0%sy,0.0%ni,89.4%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu8:77.0%us,2.0%sy,0.0%ni,21.1%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu9:77.0%us,2.0%sy,0.0%ni,21.1%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu10:77.6%us,1.7%sy,0.0%ni,20.8%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu11:75.7%us,2.0%sy,0.0%ni,21.4%id,1.0%wa,0.0%hi,0.0%si,0.0%st
Cpu12:76.6%us,2.3%sy,0.0%ni,21.1%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu13:76.6%us,2.3%sy,0.0%ni,21.1%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu14:76.2%us,2.6%sy,0.0%ni,15.9%id,5.3%wa,0.0%hi,0.0%si,0.0%st
Cpu15:76.6%us,2.0%sy,0.0%ni,21.5%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu16:73.6%us,2.6%sy,0.0%ni,23.8%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu17:74.5%us,2.3%sy,0.0%ni,23.2%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu18:73.9%us,2.3%sy,0.0%ni,23.8%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu19:72.9%us,2.6%sy,0.0%ni,24.4%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu20:72.8%us,2.6%sy,0.0%ni,24.5%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu21:72.7%us,2.3%sy,0.0%ni,25.0%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu22:72.5%us,2.6%sy,0.0%ni,24.8%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu23:73.0%us,2.3%sy,0.0%ni,24.7%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu24:74.7%us,2.7%sy,0.0%ni,22.7%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu25:74.5%us,2.6%sy,0.0%ni,22.8%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu26:73.7%us,2.0%sy,0.0%ni,24.3%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu27:74.1%us,2.3%sy,0.0%ni,23.6%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu28:74.1%us,2.3%sy,0.0%ni,23.6%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu29:74.0%us,2.0%sy,0.0%ni,24.0%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu30:73.2%us,2.3%sy,0.0%ni,24.5%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu31:73.1%us,2.0%sy,0.0%ni,24.9%id,0.0%wa,0.0%hi,0.0%si,0.0%st
内存:总计132138168k,已使用131711704k,免费426464k,缓冲区88336k
掉期:总5701624k,已使用229572k,免费5472052k,已缓存13745596k
PID用户PR NI VIRT RES SHR S%CPU%MEM TIME +命令
13865根20 0 122g 112g 3.1g S 2334.3 89.6 20726:49 java
27139杰恩20 0 15428 1728 952 S 2.6 0.0 0:04.21顶部
27161 sysadmin 20 0 15428 1712 940 R 1.0 0.0 0:00.28返回页首
33根20 0 0 0 0 S 0.3 0.0 0:06.24 ksoftirqd / 7
131根20 0 0 0 0 S 0.3 0.0 0:09.52 events / 0
1858根20 0 0 0 0 S 0.3 0.0 1:35.14 kondemand / 0
Java堆栈的转储确认没有线程在使用锁的几个地方附近,也没有在任何磁盘或网络I / O附近的地方。
我很难找到一个清晰的解释来解释“空闲”与“等待”对“顶部”的含义,但是我给人的印象是,“空闲”的意思是“不需要运行更多线程”,但这在意义上是没有意义的。我们的情况。我们正在使用“Executors.newFixedThreadPool(30)”。有大量待处理的任务,每个任务持续10秒钟左右。
我怀疑这种解释需要对NUMA有很好的理解。当CPU等待非本地访问时,您看到的是“空闲”状态吗?如果没有,那是什么解释? 可能是线程之间对共享数据的访问争用。这可能采取锁定争用的形式,或者由于读取或写入障碍而导致额外的内存通信,尽管后者不太可能产生这些症状。 您正在泄漏工作线程;例如它们偶尔会死亡,不会被取代。 执行器本身可能存在瓶颈。例如它可能无法通过安排下一个任务对完成任务做出足够快的响应。 瓶颈可能是垃圾收集器,尤其是在您未启用并行收集的情况下。
This page讨论Java的NUMA增强功能,并提及支持NUMA的GC开关。试试看另请参阅该页面上的其他GC调整建议。
这个问题解释了过程状态:In linux, what do all the values in the "top" command mean?。
我认为处理器摘要中“wa”和“idle”时间之间的区别在于,“wa”表示处理器的线程处于“D”状态;即等待磁盘I / O。相反,所有线程都处于“S”状态的处理器将被视为“空闲”。 (从这个角度来看,正在等待锁的线程将处于S状态。)
您也可以尝试使用
想改善这个问题吗? Update the question,所以它是用于堆栈溢出的on-topic。
7年前关闭。
Improve this question
我们刚刚交付了功能强大的128Gb 32核AMD Opteron服务器。我们有2个6272 CPU,每个CPU具有16个内核。我们正在30个线程上运行一个长时间运行的大型Java任务。我们已针对Linux和Java启用了NUMA优化。我们的Java线程主要使用对该线程专用的对象,有时读取其他线程将要读取的内存,并且非常非常偶尔地编写或锁定共享对象。
我们无法解释为什么CPU内核空闲25%。以下是“顶部”的转储:
顶部-23:06:38最多1天,23分钟,3个用户,平均负载:10.84、10.27、9.62
任务:总共676次,正在跑步1次,正在睡觉675次,已停止0次,丧尸0次
Cpu(s):64.5%us,1.3%sy,0.0%ni,32.9%id,1.3%wa,0.0%hi,0.0%si,0.0%st
内存:总计132138168k,已使用131652664k,免费485504k,缓冲92340k
掉期:总5701624k,已使用230252k,免费5471372k,已缓存13444344k
...
顶部-22:37:39最多23:54,3个用户,平均负载:7.83、8.70、9.27
任务:总计678,正在运行1,正在睡眠677,已停止0,僵尸0
Cpu0:75.8%us,2.0%sy,0.0%ni,22.2%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu1:77.2%us,1.3%sy,0.0%ni,21.5%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu2:77.3%us,1.0%sy,0.0%ni,21.7%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu3:77.8%us,1.0%sy,0.0%ni,21.2%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu4:76.9%us,2.0%sy,0.0%ni,21.1%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu5:76.3%us,2.0%sy,0.0%ni,21.7%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu6:12.6%us,3.0%sy,0.0%ni,84.4%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu7:8.6%us,2.0%sy,0.0%ni,89.4%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu8:77.0%us,2.0%sy,0.0%ni,21.1%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu9:77.0%us,2.0%sy,0.0%ni,21.1%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu10:77.6%us,1.7%sy,0.0%ni,20.8%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu11:75.7%us,2.0%sy,0.0%ni,21.4%id,1.0%wa,0.0%hi,0.0%si,0.0%st
Cpu12:76.6%us,2.3%sy,0.0%ni,21.1%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu13:76.6%us,2.3%sy,0.0%ni,21.1%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu14:76.2%us,2.6%sy,0.0%ni,15.9%id,5.3%wa,0.0%hi,0.0%si,0.0%st
Cpu15:76.6%us,2.0%sy,0.0%ni,21.5%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu16:73.6%us,2.6%sy,0.0%ni,23.8%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu17:74.5%us,2.3%sy,0.0%ni,23.2%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu18:73.9%us,2.3%sy,0.0%ni,23.8%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu19:72.9%us,2.6%sy,0.0%ni,24.4%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu20:72.8%us,2.6%sy,0.0%ni,24.5%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu21:72.7%us,2.3%sy,0.0%ni,25.0%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu22:72.5%us,2.6%sy,0.0%ni,24.8%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu23:73.0%us,2.3%sy,0.0%ni,24.7%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu24:74.7%us,2.7%sy,0.0%ni,22.7%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu25:74.5%us,2.6%sy,0.0%ni,22.8%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu26:73.7%us,2.0%sy,0.0%ni,24.3%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu27:74.1%us,2.3%sy,0.0%ni,23.6%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu28:74.1%us,2.3%sy,0.0%ni,23.6%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu29:74.0%us,2.0%sy,0.0%ni,24.0%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu30:73.2%us,2.3%sy,0.0%ni,24.5%id,0.0%wa,0.0%hi,0.0%si,0.0%st
Cpu31:73.1%us,2.0%sy,0.0%ni,24.9%id,0.0%wa,0.0%hi,0.0%si,0.0%st
内存:总计132138168k,已使用131711704k,免费426464k,缓冲区88336k
掉期:总5701624k,已使用229572k,免费5472052k,已缓存13745596k
PID用户PR NI VIRT RES SHR S%CPU%MEM TIME +命令
13865根20 0 122g 112g 3.1g S 2334.3 89.6 20726:49 java
27139杰恩20 0 15428 1728 952 S 2.6 0.0 0:04.21顶部
27161 sysadmin 20 0 15428 1712 940 R 1.0 0.0 0:00.28返回页首
33根20 0 0 0 0 S 0.3 0.0 0:06.24 ksoftirqd / 7
131根20 0 0 0 0 S 0.3 0.0 0:09.52 events / 0
1858根20 0 0 0 0 S 0.3 0.0 1:35.14 kondemand / 0
Java堆栈的转储确认没有线程在使用锁的几个地方附近,也没有在任何磁盘或网络I / O附近的地方。
我很难找到一个清晰的解释来解释“空闲”与“等待”对“顶部”的含义,但是我给人的印象是,“空闲”的意思是“不需要运行更多线程”,但这在意义上是没有意义的。我们的情况。我们正在使用“Executors.newFixedThreadPool(30)”。有大量待处理的任务,每个任务持续10秒钟左右。
我怀疑这种解释需要对NUMA有很好的理解。当CPU等待非本地访问时,您看到的是“空闲”状态吗?如果没有,那是什么解释?
最佳答案
可能有很多事情:
This page讨论Java的NUMA增强功能,并提及支持NUMA的GC开关。试试看另请参阅该页面上的其他GC调整建议。
这个问题解释了过程状态:In linux, what do all the values in the "top" command mean?。
我认为处理器摘要中“wa”和“idle”时间之间的区别在于,“wa”表示处理器的线程处于“D”状态;即等待磁盘I / O。相反,所有线程都处于“S”状态的处理器将被视为“空闲”。 (从这个角度来看,正在等待锁的线程将处于S状态。)
您也可以尝试使用
top -H
单独显示线程。关于java - 为什么我的Opteron内核每个仅以75%的容量运行? (25%的CPU空闲),我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/12738991/
10-16 12:51