问题描述
我需要抓取一个大的html文件(例如: http://www.indianrail. gov.in/mail_express_trn_list.html )使用简单的html dom.我从一个简单的脚本开始:
I need to scrape a large html file (eg: http://www.indianrail.gov.in/mail_express_trn_list.html) using simple html dom. I started with a simple script:
<?php
require "simple_html_dom.php";
echo file_get_html('http://www.indianrail.gov.in/mail_express_trn_list.html')->plaintext;
?>
不显示任何内容,仅显示空白页面,并在Apache error.log文件中显示错误消息
which shows nothing, just a blank page with the error message in Apache error.log file
PHP Notice: Trying to get property of non-object in /var/www/index.php on line 3
PHP Notice: Trying to get property of non-object in /var/www/index.php on line 3
同时所有其他页面(例如: http://www.indianrail.gov .in/special_trn_list.html )在相同的脚本中可以正常工作.
at the same time all other pages (eg: http://www.indianrail.gov.in/special_trn_list.html) works fine with the same script.
推荐答案
问题似乎是在simple_html_dom
中定义的MAX_FILE_SIZE
.
The issue appears to be MAX_FILE_SIZE
defined in simple_html_dom
.
您可以通过在simple_html_dom.php文件中编辑define('MAX_FILE_SIZE', 600000);
行来对其进行调整.
you can adjust it by editing define('MAX_FILE_SIZE', 600000);
line in simple_html_dom.php file.
这篇关于简单的html dom抓取大html文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!