问题描述
我处理大型XML文件(数兆字节),为此我不得不做出各种实物检查。不过,我有这非常快速增长的内存和时间的使用问题。我测试过这样的:
I'm dealing with large XML files (several megabytes) for which I have to make various kind of checks. However I have problem with memory and time usage which grows very quickly. I've tested it like this:
$xml = new SimpleXMLElement($string);
$sum_of_elements = (double)0.0;
foreach ( $xml->xpath('//Amt') as $amt ) {
$sum_of_elements += (double)$amt;
}
使用microtime中()和memory_get_usage()-functions我得到这个运行code以下结果:
With microtime() and memory_get_usage() -functions I get the following results by running this code:
- 5MB的文件(7480 AMT的元素):
- 执行时间0,69s
- 内存使用率从10.25Mb成长为29.75Mb
这仍然是相当好的。但后来有一个大一点的文件记忆和使用时间增长非常
That's still quite ok. But then with a bit bigger file memory and time usage grow very much:
- 6MB文件(8976 AMT的元素):
- 执行时间8,53s
- 内存使用率从10.25Mb成长为99.25Mb
这个问题似乎是在循环的结果集。我也试过循环代替的foreach但没有任何区别。如果没有循环内存使用量不会增长这么多。
The problem seems to be in looping the result set. I've also tried for-loop instead of foreach but with no difference. Without looping the memory usage does not grow so much.
任何想法,问题可能是什么?
Any idea where the problem could be?
推荐答案
SimpleXML的是基于树的,将整个文件加载到内存中。使用
取消设置
标记不再需要的的进行清理might产量较少的内存使用。如果那不解决这一问题,可以考虑使用为基于拉做法。虽然你将无法使用XPath,内存的消耗应该是显著下降。SimpleXML is tree-based and will load the entire document into memory. Using
unset
to mark no longer needed resources for PHP's GC for cleanup during a loop might yield less memory usage. If that doesnt solve the issue, consider using XMLReader for a pull-based approach. Though you won't be able to use XPath, memory consumption should be significantly lower.这篇关于XML XPath搜索和阵列与PHP的循环,内存问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!