本文介绍了允许搜索机器人在没有会话ID的情况下抓取您的网站的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Google的网站站长指南状态

我的ASP.NET 1.1站点使用自定义身份验证/授权,并且在很大程度上依赖于会话向导(类似于这种方法).我担心允许非会话跟踪的流量会破坏我现有的代码或引入安全漏洞.

My ASP.NET 1.1 site uses custom authentication/authorization and relies pretty heavily on session guids (similar to this approach). I'm worried that allowing non-session tracked traffic will either break my existing code or introduce security vulnerabilities.

有什么最佳实践可允许非会话跟踪的漫游器抓取正常会话跟踪的网站?除了检查用户代理(我不希望人们伪装成googlebot来绕过我的会话跟踪)之外,还有其他方法可以检测到搜索引擎吗?

What best practices are there for allowing non-session tracked bots to crawl a normally session tracked site? And are there any ways of detecting search bots other than inspecting the user agent (i don't want people to spoof themselves as googlebot to get around my session tracking)?

推荐答案

检测机器人的正确方法是通过主机条目( Dns.GetHostEntry ).一些la脚的机器人要求您按ip地址进行跟踪,但流行的机器人通常不需要.Googlebot请求来自* .googlebot.com.收到主机条目后,应检入 IPHostEntry.AddressList 以确保它包含原始IP地址.

The correct way to detect bots is by host entry (Dns.GetHostEntry). Some lame robots require you to track by ip address, but the popular ones generally don't. Googlebot requests come from *.googlebot.com. After you get the host entry, you should check in the IPHostEntry.AddressList to make sure it contains the original ip address.

在验证机器人时,甚至不看用户代理.

Do not even look at the user agent when verifying robots.

另请参见 http://googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.html

这篇关于允许搜索机器人在没有会话ID的情况下抓取您的网站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-15 14:00