本文介绍了Python:将轨迹分解成步骤的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  user_id,轨迹
11011,[[ [86],[110],[110]]
2139671,[[89],[125]]
3945641,[[36],[73],[110],[110]]
10024312,[[123],[27],[97],[97],[97],[110]]
14270422,[[0],[110],[174]]
14283758,[[110],[184]]
14317445,[[50],[88]]
14331818,[[0],[22],[36],[ ,[[131],[131]]
14334591,[[107],[19]]
14373703,[[35],[97],[97],[97],[ 58]]

我想将多个移动轨迹分割成单独的分段,但我不确定如何。



示例:

  14373703,[[35],[97],[97],[97],[17],[58]] 


$

$ p $ 14373703,[[35,97],[97, 97],[97,17],[17,58]]

目的是这些作为NetworkX中的边缘将它们分析为图形和标识ify密集运动(边缘)在各个集群(节点)之间。

这是我最初用来创建轨迹的代码:

 #导入数据
data = pd.read_csv('G:\编程项目\GSGS 681 \dmv_tweets_20170309_20170314_cluster_outputs.csv',delimiter = ',',engine ='python')
#print len(data),rows

#创建数据Fame
df = pd.DataFrame(data,columns = ['user_id','timestamp','latitude','longitude','cluster_labels'])

#按照user_id的数量过滤数据帧
filtered = df.groupby('user_id ').filter(lambda x:x ['user_id'] .count()> 1)
#filtered.to_csv('G:\ Program Programming Projects \GSGS 681\dmv_tweets_20170309_20170314_final_filtered.csv',index = False,header = True)

#获取唯一的user_id值列表
uniqueIds = np.unique(filtered ['user_id']。values)

#获取有序(按时间戳)坐标f或每个user_id
output = [[id,filtered.loc [filtered ['user_id'] == id] .sort_values(by ='timestamp')[['cluster_labels']]。values.tolist()] for id in uniqueIds]

#将输出保存为csv
outputs = pd.DataFrame(输出)
#print输出
headers = ['user_id','trajectory ']
outputs.to_csv('G:\ Programmable Projects \GGS 681\dmv_tweets_20170309_20170314_cluster_moves.csv',index = False,header = headers)

如果以这种方式分解是可能的,它可以在处理过程中完成,而不是事后?我想在创建时执行它,以消除任何后处理。

我认为你可以使用函数return all no NaN 值,如果通过 length 没有NaN更好地过滤,则 len 。

  #filtering and sorting 
filtered = df.groupby('user_id')。filter(lambda x :len(x ['user_id'])> 1)
filtered = filtered.sort_values(by ='timestamp')
$ bf = lambda x:[list(a)for a拉链(X [ - 1] ,(x)[1:])]
df2 = filtered.groupby('user_id')['cluster_labels']。apply(f).reset_index()
print(df2)
user_id cluster_labels
0 11011 [[86,110],[110,110]]
1 2139671 [[89,125]]
2 3945641 [[36,73],[73,110] ,[110,110]]
3 10024312 [[123,27],[27,97],[97,97],[97,97],[97,...
4 14270422 [[0,110],[110,174]]
5 14283758 [[110,184]]
6 14373703 [[35,97],[97,97],[97,97] ,[97,17],[17,...

类似的解决方案,过滤是最后一步:

  filtered = filtered.sort_values(by ='timestamp')

f = lambda x:[list(a)for一个in zip(x [: - 1],x [1:])]
df2 = filtered.groupby('user_id')['cluster_labels'] .application(f).reset_index()
df2 = df2 [df2 ['cluster_labels']。str.len()> 0]
print(df2)
user_id cluster_labels
1 11011 [[86,110],[110,110]]
2 2139671 [[89,125]]
3 3945641 [[36,73],[73,110],[110,110]]
4 10024312 [[123,27],[27,97],[97,97],[97 ,97],[97,...
5 14270422 [[0,110],[110,174]]
6 14283758 [[110,184]]
7 14373703 [[ 35,97],[97,97],[97,97],[97,17],[17,...


I have trajectories created from moves between clusters such as these:

user_id,trajectory
11011,[[[86], [110], [110]]
2139671,[[89], [125]]
3945641,[[36], [73], [110], [110]]
10024312,[[123], [27], [97], [97], [97], [110]]
14270422,[[0], [110], [174]]
14283758,[[110], [184]]
14317445,[[50], [88]]
14331818,[[0], [22], [36], [131], [131]]
14334591,[[107], [19]]
14373703,[[35], [97], [97], [97], [17], [58]]

I would like to split the trajectories with multiple moves into individual segments, but I am unsure how.

Example:

14373703,[[35], [97], [97], [97], [17], [58]]

into

14373703,[[35,97], [97,97], [97,17], [17,58]]

The purpose is to then use these as edges in NetworkX to analyse them as a graph and identify dense movements (edges) between the individual clusters (nodes).

This is the code I've used to create the trajectories initially:

# Import Data
data = pd.read_csv('G:\Programming Projects\GGS 681\dmv_tweets_20170309_20170314_cluster_outputs.csv', delimiter=',', engine='python')
#print len(data),"rows"

# Create Data Fame
df = pd.DataFrame(data, columns=['user_id','timestamp','latitude','longitude','cluster_labels'])

# Filter Data Frame by count of user_id
filtered = df.groupby('user_id').filter(lambda x: x['user_id'].count()>1)
#filtered.to_csv('G:\Programming Projects\GGS 681\dmv_tweets_20170309_20170314_final_filtered.csv', index=False, header=True)

# Get a list of unique user_id values
uniqueIds = np.unique(filtered['user_id'].values)

# Get the ordered (by timestamp) coordinates for each user_id
output = [[id,filtered.loc[filtered['user_id']==id].sort_values(by='timestamp')[['cluster_labels']].values.tolist()] for id in uniqueIds]

# Save outputs as csv
outputs = pd.DataFrame(output)
#print outputs
headers = ['user_id','trajectory']
outputs.to_csv('G:\Programming Projects\GGS 681\dmv_tweets_20170309_20170314_cluster_moves.csv', index=False, header=headers)

If splitting this way is possible, can it be completed during the processing, as opposed to after the fact? I'd like to perform it while creating, to eliminate any postprocessing.

解决方案

I think you can use groupby with apply and custom function with zip, for output list of lists in necessary list comprehension:

Notice:

count function return all no NaN values, if filtering by length without NaN better is len.

#filtering and sorting     
filtered = df.groupby('user_id').filter(lambda x: len(x['user_id'])>1)
filtered = filtered.sort_values(by='timestamp')

f = lambda x: [list(a) for a in zip(x[:-1], x[1:])]
df2 = filtered.groupby('user_id')['cluster_labels'].apply(f).reset_index()
print (df2)
    user_id                                     cluster_labels
0     11011                            [[86, 110], [110, 110]]
1   2139671                                        [[89, 125]]
2   3945641                  [[36, 73], [73, 110], [110, 110]]
3  10024312  [[123, 27], [27, 97], [97, 97], [97, 97], [97,...
4  14270422                             [[0, 110], [110, 174]]
5  14283758                                       [[110, 184]]
6  14373703  [[35, 97], [97, 97], [97, 97], [97, 17], [17, ...

Similar solution, filtering is last step by boolean indexing:

filtered = filtered.sort_values(by='timestamp')

f = lambda x: [list(a) for a in zip(x[:-1], x[1:])]
df2 = filtered.groupby('user_id')['cluster_labels'].apply(f).reset_index()
df2 = df2[df2['cluster_labels'].str.len() > 0]
print (df2)
    user_id                                     cluster_labels
1     11011                            [[86, 110], [110, 110]]
2   2139671                                        [[89, 125]]
3   3945641                  [[36, 73], [73, 110], [110, 110]]
4  10024312  [[123, 27], [27, 97], [97, 97], [97, 97], [97,...
5  14270422                             [[0, 110], [110, 174]]
6  14283758                                       [[110, 184]]
7  14373703  [[35, 97], [97, 97], [97, 97], [97, 17], [17, ...

这篇关于Python:将轨迹分解成步骤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-10 21:35