问题描述
我有一组对象及其随时间的位置.我想获取每辆汽车与其最近邻居之间的距离,并计算每个时间点的平均值.数据帧示例如下:
I have a set of objects and their positions over time. I would like to get the distance between each car and their nearest neighbour, and calculate an average of this for each time point. An example dataframe is as follows:
time = [0, 0, 0, 1, 1, 2, 2]
x = [216, 218, 217, 280, 290, 130, 132]
y = [13, 12, 12, 110, 109, 3, 56]
car = [1, 2, 3, 1, 3, 4, 5]
df = pd.DataFrame({'time': time, 'x': x, 'y': y, 'car': car})
df
x y car
time
0 216 13 1
0 218 12 2
0 217 12 3
1 280 110 1
1 290 109 3
2 130 3 4
2 132 56 5
对于每个时间点,我想知道每辆汽车最近的汽车邻居.示例:
For each time point, I would like to know the nearest car neighbour for each car. Example:
df2
car nearest_neighbour euclidean_distance
time
0 1 3 1.41
0 2 3 1.00
0 3 1 1.41
1 1 3 10.05
1 3 1 10.05
2 4 5 53.04
2 5 4 53.04
我知道我可以从但是如何获取每辆车的最近邻居?
I know I can caluclate the pairwise distances between cars from How to apply euclidean distance function to a groupby object in pandas dataframe? but how do I get the nearest neighbour for each car?
在那之后,使用groupby来获取每一帧的平均距离似乎很简单,但是第二步确实让我失望了.感谢帮助!
After that it seems simple enough to get an average of the distances for each frame using groupby, but its the second step that really throws me off.Help appreciated!
推荐答案
这可能有点矫kill过正,但您可以使用与scikit保持最近的邻居
It might be a bit overkill but you could use nearest neighbors from scikit
一个例子:
import numpy as np
from sklearn.neighbors import NearestNeighbors
import pandas as pd
def nn(x):
nbrs = NearestNeighbors(n_neighbors=2, algorithm='auto', metric='euclidean').fit(x)
distances, indices = nbrs.kneighbors(x)
return distances, indices
time = [0, 0, 0, 1, 1, 2, 2]
x = [216, 218, 217, 280, 290, 130, 132]
y = [13, 12, 12, 110, 109, 3, 56]
car = [1, 2, 3, 1, 3, 4, 5]
df = pd.DataFrame({'time': time, 'x': x, 'y': y, 'car': car})
#This has the index of the nearest neighbor in the group, as well as the distance
nns = df.drop('car', 1).groupby('time').apply(lambda x: nn(x.as_matrix()))
groups = df.groupby('time')
nn_rows = []
for i, nn_set in enumerate(nns):
group = groups.get_group(i)
for j, tup in enumerate(zip(nn_set[0], nn_set[1])):
nn_rows.append({'time': i,
'car': group.iloc[j]['car'],
'nearest_neighbour': group.iloc[tup[1][1]]['car'],
'euclidean_distance': tup[0][1]})
nn_df = pd.DataFrame(nn_rows).set_index('time')
结果:
car euclidean_distance nearest_neighbour
time
0 1 1.414214 3
0 2 1.000000 3
0 3 1.000000 2
1 1 10.049876 3
1 3 10.049876 1
2 4 53.037722 5
2 5 53.037722 4
(请注意,在时间0,汽车3的最近邻居是汽车2.sqrt((217-216)**2 + 1)
大约是1.4142135623730951
而sqrt((218-217)**2 + 0) = 1
)
(Note that at time 0, car 3's nearest neighbor is car 2. sqrt((217-216)**2 + 1)
is about 1.4142135623730951
while sqrt((218-217)**2 + 0) = 1
)
这篇关于计算 pandas 数据框中最近邻居的平均距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!