本文介绍了PHP file()vs fopen()+ fgets()性能辩论的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在重写一些脚本,以将机器生成的日志从perl解析为php文件范围从20mb〜400mb我遇到这个问题来决定是否应该使用file()或fopen()+ fgets()组合来遍历文件,以提高性能.

I am in process of rewriting some scripts to parse machine generated logs from perl to phpThe files range from 20mb~400mbI am running into this problem to decide if I should use file() or fopen()+fgets() combo to go through the file for some faster performance.

这是基本的贯穿过程,我会在打开文件之前检查文件大小,如果文件大于100mb(这种情况很少发生,但有时会发生),因为我只会将脚本的内存限制提高到384mb,所以我会采用fopen + fgets路线,任何大于100mb的文件都有可能导致致命错误.否则,我使用file().

Here is the basic run through,I check for file size before opening it, and if file is larger than 100mb(pretty rare case, but it does happen from time to time) I will go the fopen+fgets route since I only bumped the memory limit for the script to 384mb, any file larger than 100mb will have chance causing fatal error. Otherwise, I use file().

在这两种方法中,我从头到尾都只逐行浏览一次文件.

I am only going through the file once from beginning to the end in both method, line by line.

这是问题,保留代码中的file()部分以处理小文件是否值得?我不知道file()(我也使用SKIP_EMPTY_LINE选项)在php中是如何工作的,它是将文件直接映射到内存中还是在浏览过程中逐行推入内存中?我对此进行了一些基准测试,性能相当接近,在40mb文件上的平均差异约为0.1s,并且file()优于fopen + fgets的优势约为80%的时间(在同一文件集上进行200次测试).

Here is the question, is it worth it to keep the file() part of the code to deal with the small files? I don't know how exactly file() (i use the SKIP_EMPTY_LINE option as well) works in php, does it map the file into the memory directly or does it shove line by line into the memory while going through it? I ran some benchmark on it, performance is pretty close, average difference is about 0.1s on 40mb file, and file() has advantage over fopen+fgets about 80% of the time(out of 200 test on the same fileset).

肯定地,删除文件部分可以为我节省一些系统内存,并且考虑到我同时运行3个同一脚本实例,因此可以在还托管该文件的12G系统上为我节省1G内存.数据库和其他废话.但是我也不想降低脚本的性能,因为每天大约有10k的日志出现,所以实际上相差0.1s.

Dropping the file part could save me some memory from the system for sure, and considering I have 3 instance of the same script running at the same time, it could save me 1G worth of memory on a 12G system that's also hosting the database and other crap. But I don't want to let the performance of the script down also, since there is like 10k of these logs coming in per day, 0.1s difference actually adds up.

任何建议都会对TIA有所帮助!

Any suggestion would help and TIA!

推荐答案

我建议坚持使用一种机制,例如foreach(new \SplFileObject('file.log') as $line).分割输入文件并并行处理它们,每个CPU内核2-3倍.奖励:优先级低于同一系统上的数据库.在PHP中,这意味着立即生成N个脚本副本,其中每个副本都有自己的文件列表或目录.由于您在谈论重写,并且IO性能是一个问题,因此请在此处考虑其他具有更多功能的平台,例如Java 7 NIO,nodejs异步IO,C#TPL.

I would suggest sticking with one mechanism, like foreach(new \SplFileObject('file.log') as $line). Split your input files and process them in parallel, 2-3x per CPU core. Bonus: lower priority than database on same system. In PHP, this would mean spawning off N copies of the script at once, where each copy has its own file list or directory. Since you're talking about a rewrite and IO performance is an issue, consider other platforms with more capabilities here, eg Java 7 NIO, nodejs asynchronous IO, C# TPL.

这篇关于PHP file()vs fopen()+ fgets()性能辩论的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 17:41