问题描述
我试图进行批量搜索并查看字符串列表并打印Google搜索返回的第一个地址。 >#!/ usr / bin / python
导入json
导入urllib
导入时间
导入熊猫作为pd
df = pd.read_csv( test.csv)
saved_column = df.Name#您还可以在saved_column中使用df ['column_name']
作为名称:
query = urllib.urlencode({ q':name})
url ='http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s'%query
search_response = urllib.urlopen (url)
search_results = search_response.read()
results = json.loads(search_results)
data = results ['responseData']
address = data [ u'results'] [0] [u'url']
打印地址
我从服务器收到403错误:
'可疑服务条款滥用。请参阅',u'responseStatus' :403
我是不是按照谷歌的服务条款允许的?
我也是试图在循环中放入time.sleep(5),但我得到了同样的错误。
预先感谢您
Google TOS不允许。如果没有他们生气,你真的不能刮谷歌。它也是一个非常复杂的拦截器,所以你可以随时拖延一段时间,但它很快失败。
对不起,你运气不好这个。
I am trying to do batch searching and go over a list of strings and print the first address that google search returns:
#!/usr/bin/python
import json
import urllib
import time
import pandas as pd
df = pd.read_csv("test.csv")
saved_column = df.Name #you can also use df['column_name']
for name in saved_column:
query = urllib.urlencode({'q': name})
url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' % query
search_response = urllib.urlopen(url)
search_results = search_response.read()
results = json.loads(search_results)
data = results['responseData']
address = data[u'results'][0][u'url']
print address
I get a 403 error from the server:'Suspected Terms of Service Abuse. Please see http://code.google.com/apis/errors', u'responseStatus': 403
Is what I'm doing is not allowed according to google's terms of service?
I also tried to put time.sleep(5) in the loop but I get the same error.
Thank you in advance
Not allowed by Google TOS. You really can't scrape google without them getting angry. It's also a pretty sophisticated blocker, so you can get around for a little while with random delays, but it fails pretty quickly.
Sorry, you're out of luck on this one.
这篇关于批量搜索谷歌:403错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!