在Pandas DataFrames中查找最近的点

本文介绍了在Pandas DataFrames中查找最近的点的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我对Python很陌生.我在Postgres中有下表.这些是具有四个坐标且具有相同Id和ZONE名称的Polygon值，我已将此数据存储在称为df1

I am quite new to Python. I have the following table in Postgres. These are Polygon values with four coordinates with same Id with ZONE name I have stored this data in Python dataframe called df1

Id  Order   Lat              Lon            Zone
00001   1   50.6373473  3.075029928          A
00001   2   50.63740441 3.075068636          A
00001   3   50.63744285 3.074951754          A
00001   4   50.63737839 3.074913884          A
00002   1   50.6376054  3.0750528            B
00002   2   50.6375896  3.0751209            B
00002   3   50.6374239  3.0750246            B
00002   4   50.6374404  3.0749554            B

我有具有Lon和Lat值的Json数据，我将它们存储为称为df2的python数据帧.

I have Json data with Lon and Lat values and I have stored them is python dataframe called df2.

Lat                  Lon
50.6375524099   3.07507914474
50.6375714407   3.07508201591

我的任务是将df2 Lat和Lon值与df1中每个区域的四个坐标进行比较，以提取区域名称并将其添加到df2.

My task is to compare df2 Lat and Lon values with four coordinates of each zone in df1 to extract the zone name and add it to df2.

例如(50.637552409 3.07507914474)属于Zone B.

#This is ID with Zone
df1 = pd.read_sql_query("""SELECT * from "zmap" """,con=engine)
#This is with lat,lon values
df2 = pd.read_sql_query("""SELECT * from "E1" """,con=engine)
df2['latlon'] = zip(df2.lat, df2.lon)
zones = [
["A", [[50.637347297, 3.075029928], [50.637404408, 3.075068636], [50.637442847, 3.074951754],[50.637378390, 3.074913884]]]]
for i in range(0, len(zones)):  # for each zone points
    X = mplPath.Path(np.array(zones[i][1]))
    # find if points are Zones
    Y= X.contains_points(df2.latlon.values.tolist())
    # Label points that are in the current zone
    df2[Y, 'zone'] = zones[i][0]

目前，我已经为"A"区手动完成此操作.我需要为df2中的坐标生成区域".

Currently I have done it manually for Zone 'A'. I need to generate the "Zones" for the coordinates in df2.

推荐答案

这听起来像是 scipy cdist ，还讨论了.

This sounds like a good use case for scipy cdist, also discussed here.

import pandas as pd
from scipy.spatial.distance import cdist


data1 = {'Lat': pd.Series([50.6373473,50.63740441,50.63744285,50.63737839,50.6376054,50.6375896,50.6374239,50.6374404]),
         'Lon': pd.Series([3.075029928,3.075068636,3.074951754,3.074913884,3.0750528,3.0751209,3.0750246,3.0749554]),
         'Zone': pd.Series(['A','A','A','A','B','B','B','B'])}

data2 = {'Lat': pd.Series([50.6375524099,50.6375714407]),
         'Lon': pd.Series([3.07507914474,3.07508201591])}


def closest_point(point, points):
    """ Find closest point from a list of points. """
    return points[cdist([point], points).argmin()]

def match_value(df, col1, x, col2):
    """ Match value x from col1 row to value in col2. """
    return df[df[col1] == x][col2].values[0]


df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

df1['point'] = [(x, y) for x,y in zip(df1['Lat'], df1['Lon'])]
df2['point'] = [(x, y) for x,y in zip(df2['Lat'], df2['Lon'])]

df2['closest'] = [closest_point(x, list(df1['point'])) for x in df2['point']]
df2['zone'] = [match_value(df1, 'point', x, 'Zone') for x in df2['closest']]

print(df2)
#    Lat        Lon       point                           closest                  zone
# 0  50.637552  3.075079  (50.6375524099, 3.07507914474)  (50.6375896, 3.0751209)  B
# 1  50.637571  3.075082  (50.6375714407, 3.07508201591)  (50.6375896, 3.0751209)  B

这篇关于在Pandas DataFrames中查找最近的点的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！