问题描述
我遇到了Java ProcessBuilder
怪异的问题.下面显示了代码(以稍微简化的形式)
I am experiencing a weird problem with the Java ProcessBuilder
. The code is shown below (in a slightly simplified form)
public class Whatever implements Runnable
{
public void run(){
//someIdentifier is a randomly generated string
String in = someIdentifier + "input.txt";
String out = someIdentifier + "output.txt";
ProcessBuilder builder = new ProcessBuilder("./whateveer.sh", in, out);
try {
Process process = builder.start();
process.waitFor();
} catch (IOException e) {
log.error("Could not launch process. Command: " + builder.command(), e);
} catch (InterruptedException ex) {
log.error(ex);
}
}
}
whatever.sh读取:
whatever.sh reads:
R --slave --args $1 $2 <whatever1.R >> r.log
Whatever
实例的负载将提交到固定大小(35)的ExecutorService
.应用程序的其余部分等待它们全部完成,并使用CountdownLatch
实施.在抛出以下异常之前,一切正常运行了几个小时(Scientific Linux 5.0,Java版本"1.6.0_24"):
Loads of instances of Whatever
are submitted to an ExecutorService
of fixed size (35). The rest of the application waits for all of them to finish- implemented with a CountdownLatch
. Everything runs fine for several hours (Scientific Linux 5.0, java version "1.6.0_24") before throwing the following exception:
java.io.IOException: Cannot run program "./whatever.sh": java.io.IOException: error=11, Resource temporarily unavailable
at java.lang.ProcessBuilder.start(Unknown Source)
... rest of stack trace omitted...
有人知道这意味着什么吗?根据java.io.IOException: error=11
的google/bing搜索结果,它不是最常见的例外情况,我完全感到困惑.
Does anyone have an idea what this means? Based on the google/bing search results for java.io.IOException: error=11
, it is not the most common of exceptions and I am completely baffled.
我很疯狂并且没有受过良好教育的猜测是,我试图在同一时间启动同一文件的线程太多.但是,重现此问题需要花费几个小时的CPU时间,因此我没有尝试使用较小的数字.
My wild and not so educated guess is that I have too many threads trying to launch the same file at the same time. However, it takes hours of CPU time to reproduce the problem, so I have not tried with a smaller number.
任何建议都将不胜感激.
Any suggestions are greatly appreciated.
推荐答案
error=11
几乎可以肯定是EAGAIN
错误代码:
The error=11
is almost certainly the EAGAIN
error code:
$ grep EAGAIN asm-generic/errno-base.h
#define EAGAIN 11 /* Try again */
clone(2)
系统调用记录了EAGAIN
错误返回:
The clone(2)
system call documents an EAGAIN
error return:
EAGAIN Too many processes are already running.
fork(2)
系统调用记录了两个EAGAIN
错误返回:
The fork(2)
system call documents two EAGAIN
error returns:
EAGAIN fork() cannot allocate sufficient memory to copy the
parent's page tables and allocate a task structure for
the child.
EAGAIN It was not possible to create a new process because
the caller's RLIMIT_NPROC resource limit was
encountered. To exceed this limit, the process must
have either the CAP_SYS_ADMIN or the CAP_SYS_RESOURCE
capability.
如果您的内存确实不足,它几乎可以肯定会显示在系统日志中.检查dmesg(1)
输出或/var/log/syslog
中是否有任何有关系统内存不足的潜在消息. (其他事情会中断.这似乎不太合理.)
If you were really that low on memory, it would almost certainly show in the system logs. Check dmesg(1)
output or /var/log/syslog
for any potential messages about low system memory. (Other things would break. This doesn't seem too plausible.)
每个用户对进程的限制或系统范围内最大进程数的可能性更大.也许您的进程之一没有正确地捕获僵尸?通过随时间检查ps(1)
输出,将很容易发现这一点:
Much more likely is running into either the per-user limit on processes or system-wide maximum number of processes. Perhaps one of your processes isn't properly reapting zombies? This would be very easy to spot by checking ps(1)
output over time:
while true ; do ps auxw >> ~/processes ; sleep 10 ; done
(也许每分钟或十分钟检查一次,如果确实确实需要几个小时才能遇到麻烦.)
(Maybe check every minute or ten minutes if it really does take hours before you're in trouble.)
如果您没有收获僵尸,请阅读对ProcessBuilder进行的所有操作,以使用waitpid(2)
收割死去的孩子.
If you're not reaping zombies, then read up on whatever you must do to ProcessBuilder to use waitpid(2)
to reap your dead children.
如果合法运行的进程超出了rlimits的允许范围,则需要在bash(1)
脚本中使用ulimit
(如果以root
运行),或者在/etc/security/limits.conf
中为nproc
属性.
If you're legitimately running more processes than your rlimits allow, you'll need to use ulimit
in your bash(1)
scripts (if running as root
) or set higher limits in /etc/security/limits.conf
for the nproc
property.
如果您正在遇到系统范围的进程限制,则可能需要在/proc/sys/kernel/pid_max
中写入一个较大的值.有关一些(简短的)详细信息,请参见proc(5)
.
If you are instead running into the system-wide process limits, you might need to write a larger value into /proc/sys/kernel/pid_max
. See proc(5)
for some (short) details.
这篇关于java.io.IOException:错误= 11的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!