问题描述
好吧,我已经半途而废了.我正在使用 geopy 对数据框进行地理编码.我编写了一个简单的函数来接受输入 - 国家名称 - 并返回纬度和经度.我使用 apply 来运行该函数,它返回一个 Pandas 系列对象.我似乎无法将其转换为数据帧.我确定我遗漏了一些明显的东西,但我是 python 的新手并且仍然是 RTFMing.顺便说一句,地理编码器功能很好用.
OK, I'm at half-wit's end. I'm geocoding a dataframe with geopy. I've written a simple function to take an input - country name - and return the latitude and longitude. I use apply to run the function and it returns a Pandas series object. I can't seem to convert it to a dataframe. I'm sure I'm missing something obvious, but I'm new to python and still RTFMing. BTW, the geocoder function works great.
# Import libraries
import os
import pandas as pd
import numpy as np
from geopy.geocoders import Nominatim
def locate(x):
geolocator = Nominatim()
# print(x) # debug
try:
#Get geocode
location = geolocator.geocode(x, timeout=8, exactly_one=True)
lat = location.latitude
lon = location.longitude
except:
#didn't work for some reason that I really don't care about
lat = np.nan
lon = np.nan
# print(lat,lon) #debug
return lat, lon # Note: also tried return { 'LAT': lat, 'LON': lon }
df_geo_in = df_addr.drop_duplicates(['COUNTRY']).reset_index() #works perfectly
df_geo_in['LAT'], df_geo_in['LON'] = df_geo_in.applymap(locate)
# error: returns more than 2 values - default index + column with results
我也试过
df_geo_in['LAT','LON'] = df_geo_in.applymap(locate)
我得到一个没有索引的单个数据框和一个包含系列的单列.
I get a single dataframe with no index and a single colume with the series in it.
我尝试了许多其他方法,包括applymap":
I've tried a number of other methods, including 'applymap' :
source_cols = ['LAT','LON']
new_cols = [str(x) for x in source_cols]
df_geo_in = df_addr.drop_duplicates(['COUNTRY']).set_index(['COUNTRY'])
df_geo_in[new_cols] = df_geo_in.applymap(locate)
很长时间后返回错误:
ValueError: 列的长度必须与键的长度相同
我还尝试使用 df.from_dict(df_geo_in)
方法手动将系列转换为数据帧,但没有成功.
I've also tried manually converting the series to a dataframe using the df.from_dict(df_geo_in)
method without success.
目标是对 166 个独特的国家/地区进行地理编码,然后将其连接回 df_addr 中的 188K 地址.我试图在我的代码中成为 pandas-y 并且如果可能的话不写循环.但我还没有发现将系列转换为数据帧的魔力,这是我第一次尝试使用 apply.
The goal is to geocode 166 unique countries, then join it back to the 188K addresses in df_addr. I'm trying to be pandas-y in my code and not write loops if possible. But I haven't found the magic to convert series into dataframes and this is the first time I've tried to use apply.
提前致谢 - 古老的 C 程序员
Thanks in advance - ancient C programmer
推荐答案
我假设 df_geo
是一个只有一列的 df,所以我相信以下应该有效:
I'm assuming that df_geo
is a df with a single column so I believe the following should work:
改变:
return lat, lon
到
return pd.Series([lat, lon])
那么你应该能够像这样分配:
then you should be able to assign like so:
df_geo_in[['LAT', 'LON']] = df_geo_in.apply(locate)
您尝试做的是将 applymap
的结果分配给 2 个新列,这在这里不正确,因为 applymap
旨在处理 df 中的每个元素,因此除非lhs 具有相同的预期形状,这不会给出预期的结果.
What you tried to do was assign the result of applymap
to 2 new columns which is incorrect here as applymap
is designed to work on every element in a df so unless the lhs has the same expected shape this won't give the desired result.
您的后一种方法也不正确,因为您删除了重复的国家/地区,然后期望这会重新分配每个国家/地区的地理位置,但形状不同.
Your latter method is also incorrect because you drop the duplicate countries and then expect this to assign every country geolocation back but the shape is different.
大型 df 创建地理定位非重复 df 然后将其合并回较大的 df 可能会更快,如下所示:
It is probably quicker for large df's to create the geolocation non-duplicated df's and then merge this back to your larger df like so:
geo_lookup = df_addr.drop_duplicates(['COUNTRY'])
geo_lookup[['LAT','LNG']] = geo_lookup['COUNTRY'].apply(locate)
df_geo_in.merge(geo_lookup, left_on='COUNTRY', right_on='COUNTRY', how='left')
这将创建一个具有地理位置地址的非重复国家/地区的 df,然后我们执行左合并返回主 df.
this will create a df with non duplicated countries with geo location addresses and then we perform a left merge back to the master df.
这篇关于Python Pandas 'apply' 返回系列;无法转换为数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!