问题描述
我对Python很陌生.我在Postgres中有下表.这些是具有四个坐标且具有相同Id
和ZONE
名称的Polygon值,我已将此数据存储在称为df1
I am quite new to Python. I have the following table in Postgres. These are Polygon values with four coordinates with same Id
with ZONE
name I have stored this data in Python dataframe called df1
Id Order Lat Lon Zone
00001 1 50.6373473 3.075029928 A
00001 2 50.63740441 3.075068636 A
00001 3 50.63744285 3.074951754 A
00001 4 50.63737839 3.074913884 A
00002 1 50.6376054 3.0750528 B
00002 2 50.6375896 3.0751209 B
00002 3 50.6374239 3.0750246 B
00002 4 50.6374404 3.0749554 B
我有具有Lon
和Lat
值的Json数据,我将它们存储为称为df2
的python数据帧.
I have Json data with Lon
and Lat
values and I have stored them is python dataframe called df2
.
Lat Lon
50.6375524099 3.07507914474
50.6375714407 3.07508201591
我的任务是将df2
Lat
和Lon
值与df1
中每个区域的四个坐标进行比较,以提取区域名称并将其添加到df2
.
My task is to compare df2
Lat
and Lon
values with four coordinates of each zone in df1
to extract the zone name and add it to df2
.
例如(50.637552409 3.07507914474)
属于Zone B
.
#This is ID with Zone
df1 = pd.read_sql_query("""SELECT * from "zmap" """,con=engine)
#This is with lat,lon values
df2 = pd.read_sql_query("""SELECT * from "E1" """,con=engine)
df2['latlon'] = zip(df2.lat, df2.lon)
zones = [
["A", [[50.637347297, 3.075029928], [50.637404408, 3.075068636], [50.637442847, 3.074951754],[50.637378390, 3.074913884]]]]
for i in range(0, len(zones)): # for each zone points
X = mplPath.Path(np.array(zones[i][1]))
# find if points are Zones
Y= X.contains_points(df2.latlon.values.tolist())
# Label points that are in the current zone
df2[Y, 'zone'] = zones[i][0]
目前,我已经为"A"区手动完成此操作.我需要为df2中的坐标生成区域".
Currently I have done it manually for Zone 'A'. I need to generate the "Zones" for the coordinates in df2.
推荐答案
这听起来像是 scipy cdist ,还讨论了.
This sounds like a good use case for scipy cdist, also discussed here.
import pandas as pd
from scipy.spatial.distance import cdist
data1 = {'Lat': pd.Series([50.6373473,50.63740441,50.63744285,50.63737839,50.6376054,50.6375896,50.6374239,50.6374404]),
'Lon': pd.Series([3.075029928,3.075068636,3.074951754,3.074913884,3.0750528,3.0751209,3.0750246,3.0749554]),
'Zone': pd.Series(['A','A','A','A','B','B','B','B'])}
data2 = {'Lat': pd.Series([50.6375524099,50.6375714407]),
'Lon': pd.Series([3.07507914474,3.07508201591])}
def closest_point(point, points):
""" Find closest point from a list of points. """
return points[cdist([point], points).argmin()]
def match_value(df, col1, x, col2):
""" Match value x from col1 row to value in col2. """
return df[df[col1] == x][col2].values[0]
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df1['point'] = [(x, y) for x,y in zip(df1['Lat'], df1['Lon'])]
df2['point'] = [(x, y) for x,y in zip(df2['Lat'], df2['Lon'])]
df2['closest'] = [closest_point(x, list(df1['point'])) for x in df2['point']]
df2['zone'] = [match_value(df1, 'point', x, 'Zone') for x in df2['closest']]
print(df2)
# Lat Lon point closest zone
# 0 50.637552 3.075079 (50.6375524099, 3.07507914474) (50.6375896, 3.0751209) B
# 1 50.637571 3.075082 (50.6375714407, 3.07508201591) (50.6375896, 3.0751209) B
这篇关于在Pandas DataFrames中查找最近的点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!