问题描述
我在使用 Simple HTML DOM 解析具有特定查询字符串的网页时遇到了一些奇怪的问题.当尝试解析经销商网站的这个二手车页面时,一些查询字符串会起作用,但其他的则不会.似乎每当页面上有更多车辆要显示时,它就不会返回 HTML 内容(意味着如果我们在分页的最后一页,它会起作用,否则不会).只是想知道是否有人有任何想法.我尝试在禁用 javascript 的情况下查看页面以查看标记是否不同,但页面的行为似乎相似.如果有人有任何想法,下面是代码......或者更好的解决方案.谢谢大家!
require('simple_html_dom.php');错误报告(E_ALL);$startingURL = 'http://www.buickgmcofmilford.com/VehicleSearchResults?model=&certified=&location=&miles=&maxPrice=&minYear=&maxYear=&bodyType=&search=preowned&trim=&make=&pageNumber=2';$getHTML = file_get_html($startingURL);如果 ($getHTML == true){echo 'TRUE
';var_dump($getHTML);}别的 {echo 'FALSE
';var_dump($getHTML);}
当使用 var_dump 和上述 URL 时,它返回一个布尔值 false.使用以下 URL 时,我可以解析数据没有问题 - http://www.buickgmcofmilford.com/VehicleSearchResults?model=&;certified=&location=&miles=&maxPrice=&minYear=&maxYear=&bodyType=&search=preowned&trim=&make=&pageNumber=5>
谢谢.
你不应该使用默认函数 file_get_html
获取远程内容,该函数使用 file_get_content
下载页面内容.有时目标网站会阻止用户代理或推荐人的请求.您可以尝试 PHP Curl 下载页面内容首先,然后用 simple_html_dom
I've encountered something strange when using Simple HTML DOM to parse a webpage with a certain query string. Some query strings work when trying to parse this used car page of a dealership's website, however others do not. It seems to be that whenever there are more vehicles to be shown on the page, it will not return the HTML content (meaning if we are on the last page of pagination it will work, otherwise it won't). Just wondering if anyone has any ideas. I've tried viewing the page with javascript disabled to see if the markup is different, but it seems like the page behaves similarly. Below is code if anyone has any ideas... Or better yet solutions. Thanks all!
require ('simple_html_dom.php');
error_reporting(E_ALL);
$startingURL = 'http://www.buickgmcofmilford.com/VehicleSearchResults?model=&certified=&location=&miles=&maxPrice=&minYear=&maxYear=&bodyType=&search=preowned&trim=&make=&pageNumber=2';
$getHTML = file_get_html($startingURL);
if ($getHTML == true){
echo '<h1>TRUE</h1>';
var_dump($getHTML);
}
else {
echo '<h1>FALSE</h1>';
var_dump($getHTML);
}
When using var_dump with the above URL it returns a boolean false. When using the following URL, I can parse the data no issue - http://www.buickgmcofmilford.com/VehicleSearchResults?model=&certified=&location=&miles=&maxPrice=&minYear=&maxYear=&bodyType=&search=preowned&trim=&make=&pageNumber=5
Thanks.
you should not use the default function file_get_html
for getting remote content, that function use file_get_content
to download page content. Sometime the target website will block your request by the user agent or referer. You could try PHP Curl to download page content first, then parse it with simple_html_dom
这篇关于简单的 HTML DOM 返回 false的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!