问题描述
我是一个相对较新的 Python 用户,正在尝试使用一个函数通过geopy"模块返回城市和国家/地区的纬度和经度.我遇到了错误,因为我的城市拼错了,我设法抓住了.我现在遇到的问题是我遇到了超时错误.我已经阅读了这个问题 Geopy: catch timeout error 并相应地调整了我的超时参数.但是,在出现超时错误之前,它现在会运行不同的时间长度.我试过在更快的网络上运行它,它在某种程度上有效.问题是我需要对 100k 行执行此操作,并且它在超时之前迭代的最大行数是 20k.非常感谢有关如何解决此问题的任何帮助/建议.
I am a relatively new Python user and am attempting to use a function to return the latitude and longitude for a city and country using the "geopy" module. I have had errors because my city was misspelled which I have managed to catch. The trouble I am now having is that I am encountering a timeout error. I have read this question Geopy: catch timeout error and adjusted my timeout parameter accordingly. However it now runs for varying lengths of time before I get a timeout error. I have tried running it over faster networks and it works to some degree. The trouble is that I need to do this for 100k rows and the maximum rows it has iterated before timing out is 20k. Any help/advice on how to solve this problem is greatly appreciated.
import os
from geopy.geocoders import Nominatim
os.getcwd() #check current working directory
os.chdir("C:\Users\Philip\Documents\HDSDA1\Project\Global Terrorism Database")
#import file as a csv
import csv
gtd=open("gtd_original.csv","r")
csv_f=csv.reader(gtd)
outf=open("r_ready.csv","wb")
writer=csv.writer(outf,dialect='excel')
for row in csv_f:
if row[13] in ("","NA") or row[14] in ("","NA"):
lookup = row[12] + "," + row[8] # creates a city,country
geolocator = Nominatim()
location = geolocator.geocode(lookup, timeout = None) #looks up the city/country on maps
try:
location.latitude
except:
lookup = row[8]
location = geolocator.geocode(lookup)
row[13] = location.latitude
row[14] = location.longitude
writer.writerow(row)
gtd.close()
outf.close()
推荐答案
我希望您超出了 Nominatim 服务的使用政策 (http://wiki.openstreetmap.org/wiki/Nominatim_usage_policy).尝试在请求之间休眠 1 秒并缓存结果,可能有很多重复.
I expect that you exceded usage policy for Nominatim service (http://wiki.openstreetmap.org/wiki/Nominatim_usage_policy). Try to put a sleep of 1 sec between requests and cache the results, probable are a lot of duplicates.
睡眠部分:
from time import sleep
### your code
row[14] = location.longitude
sleep(1) # after last line in if
缓存:
coords = {}
if coords.has_key([row[8], row[12] ]):
row[13] , row[14] = coords[ [ row[8], row[12] ] ]
else:
#geolocate
更新
性能:1 请求/秒 --> 3600 请求/小时 --> 36K 请求/10 小时
performance: 1 request/sec --> 3600 reqs/hour --> 36K requests/10h
import os
from time import sleep
from geopy.geocoders import Nominatim
os.getcwd() #check current working directory
os.chdir("C:\Users\Philip\Documents\HDSDA1\Project\Global Terrorism Database")
#import file as a csv
import csv
gtd=open("gtd_original.csv","r")
csv_f=csv.reader(gtd)
outf=open("r_ready.csv","wb")
writer=csv.writer(outf,dialect='excel')
coords = {}
for row in csv_f:
if row[13] in ("","NA") or row[14] in ("","NA"):
lookup = row[12] + "," + row[8] # creates a city,country
if coords.has_key( (row[8], row[12]) ): ## test if result is already cached
row[13] , row[14] = coords[ (row[8], row[12]) ]
else:
geolocator = Nominatim()
location = geolocator.geocode(lookup, timeout = None) #looks up the city/country on maps
try:
location.latitude
except:
lookup = row[8]
location = geolocator.geocode(lookup)
row[13] = location.latitude
row[14] = location.longitude
coords[ (row[8], row[12]) ] = (row[13] , row[14]) # cache the new coords
sleep(1) # sleep for 1 sec (required by Nominatim usage policy)
writer.writerow(row)
gtd.close()
outf.close()
这篇关于Python geopy地理编码器中的超时错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!