本文介绍了如何在多核Linux机器上使用GNU make --max-load?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从GNU make的文档中: http://www. gnu.org/software/make/manual/make.html#Parallel

From the documentation for GNU make: http://www.gnu.org/software/make/manual/make.html#Parallel

 -l 2.5

如果平均负载高于2.5,则

不允许启动多个作业. 如果后面没有数字,则"-l"选项将删除负载限制 加上先前的"-l"选项.

will not let make start more than one job if the load average is above 2.5. The ‘-l’ option with no following number removes the load limit, if one was given with a previous ‘-l’ option.

更准确地说,当make开始工作时,它已经具有 至少有一个作业正在运行,它会检查当前的平均负载;如果是 不低于"-l"给定的限制,请等待直到加载 平均低于该限制,或者直到所有其他作业结束.

More precisely, when make goes to start up a job, and it already has at least one job running, it checks the current load average; if it is not lower than the limit given with ‘-l’, make waits until the load average goes below that limit, or until all the other jobs finish.

从Linux联机帮助页获取正常运行时间: http://www. unix.com/man-page/Linux/1/uptime/

From the Linux man page for uptime: http://www.unix.com/man-page/Linux/1/uptime/

我有一个并行的makefile,我想做一件显而易见的事情:让make不断添加进程,直到获得完全的CPU使用率,但又不会引起系统颠簸.

I have a parallel makefile and I want to do the obvious thing: have make to keep adding processes until I am getting full CPU usage but I'm not inducing thrashing.

当今(所有?)机器都是多核的,因此这意味着负载平均值不是应该检查的数字,因为需要针对内核数调整该数字.

Many (all?) machines today are multicore, so that means that the load average is not the number make should be checking, as that number needs to be adjusted for the number of cores.

这是否意味着GNU make的--max-load(又名-l)标志现在无用了?在多核计算机上运行并行makefile的人在做什么?

Does this mean that the --max-load (aka -l) flag to GNU make is now useless? What are people doing who are running parallel makefiles on multicore machines?

推荐答案

我的简短回答:如果您愿意花费时间来充分利用它,则--max-load很有用.在当前的实现方式中,没有简单的公式可以选择好的价值,也没有预制的工具来发现它们.

My short answer: --max-load is useful if you're willing to invest the time it takes to make good use of it. With its current implementation there's no simple formula to pick good values, or a pre-fab tool for discovering them.

我维护的构建相当大.在我开始维护它之前,构建过程需要6个小时.现在,在ramdisk上使用-j64时,它可以在5分钟内完成(在使用-j12的NFS挂载中,它需要30分钟).我的目标是为-j-l找到合理的上限,以使我们的开发人员可以快速构建,但不会使该服务器(构建服务器或NFS服务器)对其他任何人都不可用.

The build I maintain is fairly large. Before I started maintaining it the build was 6 hours. With -j64 on a ramdisk, now it finishes in 5 minutes (30 on an NFS mount with -j12). My goal here was to find reasonable caps for -j and -l that allows our developers to build quickly but doesn't make the server (build server or NFS server) unusable for everyone else.

开始于:

  • 如果在您的计算机上选择一个合理的 -jN值,并在您的计算机上找到一个合理的平均负载上限,则它们可以很好地协同工作以保持平衡.
  • li>
  • 如果您使用非常大的-jN值(或未指定;例如,-j没有数字)并限制平均负载,则gmake将执行以下操作:
    • 继续生成程序(gmake 3.81添加了一个节流机制,但这只能帮助缓解一些问题),直到达到作业的最大数量或平均负载超过阈值为止.
    • 平均负载超出阈值时:
      • 什么都不做,直到所有子流程完成
      • 一次生成一份工作
      • If you choose a reasonable -jN value (on your machine) and find a reasonable upper bound for load average (on your machine), they work nicely together to keep things balanced.
      • If you use a very large -jN value (or unspecified; eg, -j without a number) and limit the load average, gmake will:
        • continue spawning processes (gmake 3.81 added a throttling mechanism, but that only helps mitigate the problem a little) until the max # of jobs is reached or until the load average goes above your threshold
        • while the load average is over your threshold:
          • do nothing until all sub-processes are finished
          • spawn one job at a time

          至少在Linux上(可能还有其他* nix变体),是指数移动平均值(重称UNIX负载平均值,Neil J. Gunther),它表示等待CPU时间的平均进程数(可能由太多进程,等待IO引起,页面错误等).由于它是指数移动平均值,因此经过加权后,较新的样本对当前值的影响将大于较旧的样本.

          On Linux at least (and probably other *nix variants), load average is an exponential moving average (UNIX Load Average Reweighed, Neil J. Gunther) that represents the avg number of processes waiting for CPU time (can be caused by too many processes, waiting for IO, page faults, etc). Since it's an exponential moving average, it's weighted such that newer samples have a stronger influence on the current value than older samples.

          如果您能够通过正确的最大负载和并行作业的数量(通过有根据的猜测和经验测试的组合)找到合适的最佳位置",那么假设您的运行时间较长:您的1分钟平均速度将达到平衡点(波动不大).但是,如果您的-jN数对于给定的最大平均负载而言过高,则波动会很大.

          If you can identify a good "sweet spot" for the right max load and number of parallel jobs (through a combination of educated guesses and empirical testing), assuming you have a long running build: your 1 min avg will hit an equilibrium point (won't fluctuate much). However, if your -jN number is too high for a given max load average, it'll fluctuate quite a bit.

          发现最佳位置本质上等同于找到微分方程的最佳参数.由于它将受到初始条件的限制,因此重点在于寻找使系统保持平衡的参数,而不是提出目标"负载平均值. 处于平衡状态"是指:1m负载平均波动幅度很小.

          Finding that sweet spot is essentially equivalent to finding optimal parameters to a differential equation. Since it will be subject to initial conditions, the focus is on finding parameters that get the system to stay at equilibrium as opposed to coming up with a "target" load average. By "at equilibrium" I mean: 1m load avg doesn't fluctuate much.

          假设您没有受到gmake限制的瓶颈:找到-jN -lM组合可以使构建时间最短时:该组合将使您的计算机达到极限.如果机器需要用于其他用途...

          Assuming you're not bottlenecked by limitations in gmake: When you've found a -jN -lM combination that gives a minimum build time: that combination will be pushing your machine to its limits. If the machine needs to be used for other purposes ...

          ...您可能需要在优化完成后将其缩小一点.

          ... you may want to scale it back a bit when you're finished optimizing.

          不考虑平均负载,我在构建时间中看到的随着-jN的增加而出现的改进似乎是[对数].也就是说,我看到-j8-j12之间的差异大于-j12-j16之间的差异.

          Without regard to load avg, the improvements I saw in build time with increasing -jN appeared to be [roughly] logarithmic. That is to say, I saw a larger difference between -j8 and -j12 than between -j12 and -j16.

          对我来说,事情在-j48-j64之间(在Solaris机器上大约是-j56)达到顶峰,因为最初的gmake进程是单线程的.在某些时候,线程无法比完成更快地启动新作业.

          Things peaked for me somewhere between -j48 and -j64 (on the Solaris machine it was about -j56) because the initial gmake process is single-threaded; at some point that thread cannot start new jobs faster than they finish.

          我的测试是在以下位置进行的:

          My tests were performed on:

          • 非递归构建
            • 递归构建可能会看到不同的结果;他们不会碰到我在-j64
            • 周围遇到的瓶颈
            • 我已尽力将配方中的make-ism(可变扩展,宏等)的数量减至最少,因为配方解析发生在产生并行作业的同一线程中.配方越复杂,它在解析器中花费的时间就越多,而不是产生/收获工作.例如:
              • 配方中未使用$(shell ...)宏;这些是在第一次解析过程中运行并缓存的
              • 为大多数变量分配了:=以避免递归扩展
              • A non-recursive build
                • recursive builds may see different results; they won't run into the bottleneck I did around -j64
                • I've done my best to minimize the amount of make-isms (variable expansions, macros, etc) in recipes because recipe parsing occurs in the same thread that spawns parallel jobs. The more complicated recipes are, the more time it spends in the parser instead of spawning/reaping jobs. For example:
                  • No $(shell ...) macros are used in recipes; those are ran during the 1st parsing pass and cached
                  • Most variables are assigned with := to avoid recursive expansion
                  • 256核
                  • 没有虚拟化/逻辑域
                  • 该版本在虚拟磁盘上运行
                  • 32核(4个超线程)
                  • 没有虚拟化
                  • 该版本在快速本地驱动器上运行

                  这篇关于如何在多核Linux机器上使用GNU make --max-load?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 06:03