本文介绍了file_get_contents => PHP致命错误:允许的内存耗尽的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我没有处理大文件的经验,所以我不确定该怎么办.我尝试使用 file_get_contents 读取多个大文件;任务是使用 preg_replace()清洁和修补它们.

I have no experience when dealing with large files so I am not sure what to do about this. I have attempted to read several large files using file_get_contents ; the task is to clean and munge them using preg_replace().

我的代码在小文件上运行良好;但是,大文件(40 MB)会触发内存耗尽错误:

My code runs fine on small files ; however, the large files (40 MB) trigger an Memory exhausted error:

PHP Fatal error:  Allowed memory size of 16777216 bytes exhausted (tried to allocate 41390283 bytes)

我当时正在考虑使用 fread(),但我不确定这是否也可以.有没有解决此问题的方法?

I was thinking of using fread() instead but I am not sure that'll work either. Is there a workaround for this problem?

感谢您的输入.

这是我的代码:

<?php
error_reporting(E_ALL);

##get find() results and remove DOS carriage returns.
##The error is thrown on the next line for large files!
$myData = file_get_contents("tmp11");
$newData = str_replace("^M", "", $myData);

##cleanup Model-Manufacturer field.
$pattern = '/(Model-Manufacturer:)(\n)(\w+)/i';
$replacement = '$1$3';
$newData = preg_replace($pattern, $replacement, $newData);

##cleanup Test_Version field and create comma delimited layout.
$pattern = '/(Test_Version=)(\d).(\d).(\d)(\n+)/';
$replacement = '$1$2.$3.$4      ';
$newData = preg_replace($pattern, $replacement, $newData);

##cleanup occasional empty Model-Manufacturer field.
$pattern = '/(Test_Version=)(\d).(\d).(\d)      (Test_Version=)/';
$replacement = '$1$2.$3.$4      Model-Manufacturer:N/A--$5';
$newData = preg_replace($pattern, $replacement, $newData);

##fix occasional Model-Manufacturer being incorrectly wrapped.
$newData = str_replace("--","\n",$newData);

##fix 'Binary file' message when find() utility cannot id file.
$pattern = '/(Binary file).*/';
$replacement = '';
$newData = preg_replace($pattern, $replacement, $newData);
$newData = removeEmptyLines($newData);

##replace colon with equal sign
$newData = str_replace("Model-Manufacturer:","Model-Manufacturer=",$newData);

##file stuff
$fh2 = fopen("tmp2","w");
fwrite($fh2, $newData);
fclose($fh2);

### Functions.

##Data cleanup
function removeEmptyLines($string)
{
        return preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $string);
}
?>

推荐答案

首先,您应该了解,在使用file_get_contents时,您正在将整个数据字符串提取到变量中,该变量存储在主机内存中.

Firstly you should understand that when using file_get_contents you're fetching the entire string of data into a variable, that variable is stored in the hosts memory.

如果该字符串大于PHP进程专用的大小,则PHP将停止并显示上面的错误消息.

If that string is greater than the size dedicated to the PHP process then PHP will halt and display the error message above.

解决此问题的方法是将文件作为指针打开,然后一次取一个块.这样,如果您有一个500MB的文件,则可以读取前1MB的数据,对其进行处理,然后从系统内存中删除该1MB,然后用下一个MB替换它.这使您可以管理要在内存中放入多少数据.

The way around this to open the file as a pointer, and then take a chunk at a time. This way if you had a 500MB file you can read the first 1MB of data, do what you will with it, delete that 1MB from the system's memory and replace it with the next MB. This allows you to manage how much data you're putting in the memory.

如果可以在下面看到一个示例,我将创建一个类似于node.js的函数

An example if this can be seen below, I will create a function that acts like node.js

function file_get_contents_chunked($file,$chunk_size,$callback)
{
    try
    {
        $handle = fopen($file, "r");
        $i = 0;
        while (!feof($handle))
        {
            call_user_func_array($callback,array(fread($handle,$chunk_size),&$handle,$i));
            $i++;
        }

        fclose($handle);

    }
    catch(Exception $e)
    {
         trigger_error("file_get_contents_chunked::" . $e->getMessage(),E_USER_NOTICE);
         return false;
    }

    return true;
}

然后像这样使用:

$success = file_get_contents_chunked("my/large/file",4096,function($chunk,&$handle,$iteration){
    /*
        * Do what you will with the {$chunk} here
        * {$handle} is passed in case you want to seek
        ** to different parts of the file
        * {$iteration} is the section of the file that has been read so
        * ($i * 4096) is your current offset within the file.
    */

});

if(!$success)
{
    //It Failed
}

您会发现的问题之一是您试图对非常大的数据块执行几次正则表达式.不仅如此,您的正则表达式还可以匹配整个文件.

One of the problems you will find is that you're trying to perform regex several times on an extremely large chunk of data. Not only that but your regex is built for matching the entire file.

使用上述方法,您的正则表达式可能变得无用,因为您可能只匹配一半的数据.您应该做的是还原为本地字符串函数,例如

With the above method your regex could become useless as you may only be matching a half set of data. What you should do is revert to the native string functions such as

  • strpos
  • substr
  • trim
  • explode
  • strpos
  • substr
  • trim
  • explode

为了匹配字符串,我在回调中添加了支持,以便传递句柄和当前迭代.这样,您就可以直接在回调中使用文件,例如,可以使用fseekftruncatefwrite之类的功能.

for matching the strings, I have added support in the callback so that the handle and current iteration are passed. This will allow you to work with the file directly within your callback, allowing you to use functions like fseek, ftruncate and fwrite for instance.

您构建字符串操作的方式效率不高,而使用上面提出的方法到目前为止是一种更好的方式.

The way you're building your string manipulation is not efficient whatsoever, and using the proposed method above is by far a much better way.

希望这会有所帮助.

这篇关于file_get_contents =&gt; PHP致命错误:允许的内存耗尽的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-03 18:48