本文介绍了用于 URL/robots.txt 的 PHP file_exists() 返回 false的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用 file_exists(URL/robots.txt) 来查看该文件是否存在于随机选择的网站上,但我得到了错误的响应;

I tryed to use file_exists(URL/robots.txt) to see if the file exists on randomly chosen websites and i get a false response;

如何检查 robots.txt 文件是否存在?

How do i check if the robots.txt file exists ?

我不想在检查之前开始下载.

I dont want to start the download before i check.

使用 fopen() 会成功吗?因为:成功时返回文件指针资源,错误时返回 FALSE.

Using fopen() will do the trick ? because : Returns a file pointer resource on success, or FALSE on error.

我想我可以放一些类似的东西:

and i guess that i can put something like:

$f=@fopen($url,"r");
if($f) ...

我的代码:

http://www1.macys.com/robots.txt也许它不在那里http://www.intend.ro/robots.txt也许它不在那里http://www.emag.ro/robots.txt也许它不在那里http://www1.bloomingdales.com/robots.txt也许它不在那里

http://www1.macys.com/robots.txtmaybe it's not therehttp://www.intend.ro/robots.txtmaybe it's not therehttp://www.emag.ro/robots.txtmaybe it's not therehttp://www1.bloomingdales.com/robots.txtmaybe it's not there

try {
            if (file_exists($file))
                {
                echo 'exists'.PHP_EOL;
                $curl_tool = new CurlTool();
                $content = $curl_tool->fetchContent($file);
                //if the file exists on local disk, delete it
                if (file_exists(CRAWLER_FILES . 'robots_' . $website_id . '.txt'))
                    unlink(CRAWLER_FILES . 'robots_' . $website . '.txt');
                echo CRAWLER_FILES . 'robots_' . $website_id . '.txt', $content . PHP_EOL;
                file_put_contents(CRAWLER_FILES . 'robots_' . $website_id . '.txt', $content);
            }
            else
            {
                echo 'maybe it\'s not there'.PHP_EOL;
            }
        } catch (Exception $e) {
            echo 'EXCEPTION ' . $e . PHP_EOL;
        }

推荐答案

file_exists 不能用于其他网站上的资源.它适用于本地文件系统.看看这里关于如何正确执行检查.

file_exists cannot be used on resources on another websites. It's intended for local filesystem. Have a look here on how to perform the check properly.

正如其他人在评论中提到的,正如链接所说,使用 get_headers 函数(可能)最容易做到这一点:

As other have mentioned in the comments and as the link says it's (probably) easiest to use get_headers function to do this:

try {
    if (strpos(get_headers($url,1),"404")!==FALSE){
        ... your code ...
    } else {
        ... you get the idea ...
    }
}

这篇关于用于 URL/robots.txt 的 PHP file_exists() 返回 false的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-29 23:24