问题描述
我正在运行一个Linux 2.6.36内核,我看到一些随机错误。像 ls:加载共享库时出错:libpthread.so.0:无法打开共享对象文件:错误23
是的,我的系统无法持续运行'ls'命令。 :
我注意到在我的dmesg输出中有几个错误:
pre $ #dmesg | tail
[2808967.543203] EXT4-fs(sda3):re-mounted。Opts :(空)
[2837776.220605] xv [14450]一般保护ip:7f20c20c6ac6 sp:7fff3641b368错误:0 libpng14.so.14.4.0 [7f20c20a9000 + 29000]
[4931344.685302] EXT4-fs(md16):re-mounted。选项:(null)
[4982666.631444] VFS:file-max limit 1231582达到
[4982666.764240] VFS:文件最大限制1231582达到
[4982767.360574] VFS:文件最大限制1231582达到
[4982901.904628] VFS:文件最大限制1231582达到
[ 4982964.930556] VFS:文件最大限制1231582达到
[4982966.352170] VFS:文件最大限制1231582达到
[4982966.649195]顶部[31095]:segfault在14 ip 00007fd6ace42700 sp 00007fff20746530 libproc-3.2中的错误6 .8.so [7fd6ace3b000 + e000]
显然,文件最大错误看起来很可疑,一起和最近。
#cat / proc / sys / fs / file-max
1231582
#cat / proc / sys / fs / file -nr
1231712 0 1231582
这对我来说也有点奇怪,但事实是,我没有办法在这个系统上打开120万个文件。我是唯一一个使用它的人,而且在本地网络以外的任何人都看不到。
#lsof | wc
16046 148253 1882901
#ps -ef | wc
574 6104 44260
我看到一些文件说:
$ b $我首先读到的是内核基本上有一个内置的文件描述符泄漏,但我觉得很难相信。这意味着任何需要重新启动的系统才能释放文件描述符。正如我所说的,我不敢相信这是真的,因为我一直都希望Linux系统一直保持好几个月(甚至几年)。另一方面,我也不能相信我的几乎空闲的系统正在打开超过一百万个文件。
有没有人有任何想法,无论是修复或进一步诊断?当然,我可以重新启动系统,但是我不希望这个问题每隔几周就会成为一个反复出现的问题。作为一个权宜之计,我已经退出了火狐浏览器,它占用了将近2000行lsof输出(!),尽管我只打开了一个窗口,现在我可以再次运行'ls'了,但是我怀疑这会修复问题很长。 (编辑:哎呀,说话太快了,当我打完这个问题,症状是/回来)
感谢提前的任何帮助。我不喜欢留下一个问题,所以任何人发现这个问题总结。
$ b
$ b
我最终重新发布了serverfault的问题,而不是(本文) a> 实际上,他们无法做出任何事情,但我做了一些调查,最终发现这是NFSv4的一个真正的bug,特别是服务器侧锁代码。我有一个每5秒钟运行一次监视脚本的NFS客户端,使用rrdtool将一些数据记录到一个NFS挂载的文件中。每次运行时,都会锁定文件进行写入,并分配服务器(但错误地不会释放)打开的文件描述符。这个脚本(加上另一个运行得不那么频繁)导致大约每小时打开900个文件,两个月后,它达到了极限。 几种解决方案是可能的: I确信你可以想到其他可能的解决方案。 感谢您的尝试。 I'm running a Linux 2.6.36 kernel, and I'm seeing some random errors. Things like Yes, my system can't consistently run an 'ls' command. :( I note several errors in my dmesg output: Obviously, the file-max errors look suspicious, being clustered together and recent. That also looks a bit odd to me, but the thing is, there's no way I have 1.2 million files open on this system. I'm the only one using it, and it's not visible to anyone outside the local network. I saw some documentation saying: My first reading of this is that the kernel basically has a built-in file descriptor leak, but I find that very hard to believe. It would imply that any system in active use needs to be rebooted every so often to free up the file descriptors. As I said, I can't believe this would be true, since it's normal to me to have Linux systems stay up for months (even years) at a time. On the other hand, I also can't believe that my nearly-idle system is holding over a million files open. Does anyone have any ideas, either for fixes or further diagnosis? I could, of course, just reboot the system, but I don't want this to be a recurring problem every few weeks. As a stopgap measure, I've quit Firefox, which was accounting for almost 2000 lines of lsof output (!) even though I only had one window open, and now I can run 'ls' again, but I doubt that will fix the problem for long. (edit: Oops, spoke too soon. By the time I finished typing out this question, the symptom was/is back) Thanks in advance for any help. I hate to leave a question open, so a summary for anyone who finds this. I ended up reposting the question on serverfault instead (this article) They weren't able to come up with anything, actually, but I did some more investigation and ultimately found that it's a genuine bug with NFSv4, specifically the server-side locking code. I had an NFS client which was running a monitoring script every 5 seconds, using rrdtool to log some data to an NFS-mounted file. Every time it ran, it locked the file for writing, and the server allocated (but erroneously never released) an open file descriptor. That script (plus another that ran less frequently) resulted in about 900 open files consumed per hour, and two months later, it hit the limit. Several solutions are possible: 1) Use NFSv3 instead. 2) Stop running the monitoring script. 3) Store the monitoring results locally instead of on NFS. 4) Wait for the patch to NFSv4 that fixes this (Bruce Fields actually sent me a patch to try, but I haven't had time) I'm sure you can think of other possible solutions. Thanks for trying. 这篇关于VFS:达到文件最大限制1231582的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
1)改为使用NFSv3。
2)停止运行监控脚本。
3)将监控结果存储在本地而不是NFS上。
4)等待修复NFSv4的补丁(Bruce Fields实际上给了我一个补丁去尝试,但是我没有时间)
ls: error while loading shared libraries: libpthread.so.0: cannot open shared object file: Error 23
# dmesg | tail
[2808967.543203] EXT4-fs (sda3): re-mounted. Opts: (null)
[2837776.220605] xv[14450] general protection ip:7f20c20c6ac6 sp:7fff3641b368 error:0 in libpng14.so.14.4.0[7f20c20a9000+29000]
[4931344.685302] EXT4-fs (md16): re-mounted. Opts: (null)
[4982666.631444] VFS: file-max limit 1231582 reached
[4982666.764240] VFS: file-max limit 1231582 reached
[4982767.360574] VFS: file-max limit 1231582 reached
[4982901.904628] VFS: file-max limit 1231582 reached
[4982964.930556] VFS: file-max limit 1231582 reached
[4982966.352170] VFS: file-max limit 1231582 reached
[4982966.649195] top[31095]: segfault at 14 ip 00007fd6ace42700 sp 00007fff20746530 error 6 in libproc-3.2.8.so[7fd6ace3b000+e000]
# cat /proc/sys/fs/file-max
1231582
# cat /proc/sys/fs/file-nr
1231712 0 1231582
# lsof | wc
16046 148253 1882901
# ps -ef | wc
574 6104 44260