本文介绍了寻找最接近的点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个all_points及其坐标的数据框:

I have a dataframe of all_points and their coordinates:

all_points =
   point_id   latitude  longitude
0          1  41.894577 -87.645307
1          2  41.894647 -87.640426
2          3  41.894713 -87.635513
3          4  41.894768 -87.630629
4          5  41.894830 -87.625793

和parent_points的数据框:

and a dataframe of the parent_points:

parent_pts =
       parent_id
0       1
1       2

我想在all_points数据帧上创建一个列,该列的父点与每个点最近.

I want to create a column on the all_points dataframe with the closest parent point to each point.

这是我的审判,但我可能会使它更复杂:

This is my trial, but I might be making it more complicated:

from scipy.spatial.distance import cdist

def closest_point(point, points):
    """ Find closest point from a list of points. """
    return points[cdist([point], points).argmin()]

def match_value(df, col1, x, col2):
    """ Match value x from col1 row to value in col2. """
    return df[df[col1] == x][col2].values[0]

all_points['point'] = [(x, y) for x,y in zip(all_points['latitude'], all_points['longitude'])]
parent_pts['point'] = all_points['point'][all_points['point_id   '].isin(parent_pts['parent_id'])]

all_points['parent'] = [match_value(parent_pts, 'point', x, 'parent_id') for x in all_points['closest']]

parent_point是all_points的子集.

The parent_point is a subset of the all_points.

当我尝试使用closest_point函数时出现此错误:

I get this error when I try to use the closest_point function:

ValueError: XB must be a 2-dimensional array.

推荐答案

首先,让我开始说,在我看来,您的经度和纬度是地球上的位置.假设地球是一个球体,则两点之间的距离应计算为沿大圆的长度距离,而不是您使用cdist获得的欧几里得距离.

First, let me start by saying that it appears to me that your longitudes and latitudes are locations on Earth. Assuming that Earth is a sphere, the distance between two points should be computed as the length along great-circle distance and not as Euclidean distance that you get using cdist.

从编程的角度来看(最适合您的学习曲线),最简单的方法是使用 astropy.他们的文档相当不错,有时带有有用的示例,例如,参见 match_coordinates_sky() 与astropy匹配的目录.

The easiest approach from the programming point of view (except for the learning curve for you) is to use the astropy package. They have quite an OK documentation sometimes with useful examples, see, e.g., match_coordinates_sky() or catalog matching with astropy.

然后您可能会执行以下操作:

Then you might do something like this:

>>> from astropy.units import Quantity
>>> from astropy.coordinates import match_coordinates_sky, SkyCoord, EarthLocation
>>> from pandas import DataFrame
>>> import numpy as np
>>>
>>> # Create your data as I understood it:
>>> all_points = DataFrame({'point_id': np.arange(1,6), 'latitude': [41.894577, 41.894647, 41.894713, 41.894768, 41.894830], 'longitude': [-87.645307, -87.640426, -87.635513, -87.630629, -87.625793 ]})
>>> parent_pts = DataFrame({'parent_id': [1, 2]})
>>>
>>> # Create a frame with the coordinates of the "parent" points:
>>> parent_coord = all_points.loc[all_points['point_id'].isin(parent_pts['parent_id'])]
>>> print(parent_coord)
    latitude  longitude  point_id
0  41.894577 -87.645307         1
1  41.894647 -87.640426         2
>>>
>>> # Create coordinate array for "points" (in principle the below statements
>>> # could be combined into a single one):
>>> all_lon = Quantity(all_points['longitude'], unit='deg')
>>> all_lat = Quantity(all_points['latitude'], unit='deg')
>>> all_pts = SkyCoord(EarthLocation.from_geodetic(all_lon, all_lat).itrs, frame='itrs')
>>>
>>> # Create coordinate array for "parent points":
>>> parent_lon = Quantity(parent_coord['longitude'], unit='deg')
>>> parent_lat = Quantity(parent_coord['latitude'], unit='deg')
>>> parent_catalog = SkyCoord(EarthLocation.from_geodetic(parent_lon, parent_lat).itrs, frame='itrs')
>>>
>>> # Get the indices (in parent_catalog) of parent coordinates
>>> # closest to each point:
>>> matched_indices = match_coordinates_sky(all_pts, parent_catalog)[0]
Downloading http://maia.usno.navy.mil/ser7/finals2000A.all
|========================================================================| 3.1M/3.1M (100.00%)         0s
>>> all_points['parent_id'] = [parent_pts['parent_id'][idx] for idx in matched_indices]
>>> print(all_points)
    latitude  longitude  point_id  parent_id
0  41.894577 -87.645307         1          1
1  41.894647 -87.640426         2          2
2  41.894713 -87.635513         3          2
3  41.894768 -87.630629         4          2
4  41.894830 -87.625793         5          2

我想补充一点,match_coordinates_sky()不仅返回匹配的索引,而且还返回数据点和匹配的父"点之间的角距列表以及数据点和匹配的父"点之间的距离(以米为单位) "观点.这可能对您的问题有用.

I would like to add that match_coordinates_sky() returns not only matching indices but also a list of angular separations between the data point and the matched "parent" point as well as distance in meters between the data points and the matched "parent" point. It may be useful for your problem.

这篇关于寻找最接近的点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-14 23:37