Python：将轨迹分解成步骤

本文介绍了Python：将轨迹分解成步骤的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

  user_id，轨迹
 11011，[[ [86]，[110]，[110]] 
 2139671，[[89]，[125]] 
 3945641，[[36]，[73]，[110]，[110]] 
 10024312，[[123]，[27]，[97]，[97]，[97]，[110]] 
 14270422，[[0]，[110]，[174]] 
 14283758，[[110]，[184]] 
 14317445，[[50]，[88]] 
 14331818，[[0]，[22]，[36]，[ ，[[131]，[131]] 
 14334591，[[107]，[19]] 
 14373703，[[35]，[97]，[97]，[97]，[ 58]]

我想将多个移动轨迹分割成单独的分段，但我不确定如何。

示例：

  14373703，[[35]，[97]，[97]，[97]，[17]，[58]]

$

$ p $ 14373703，[[35,97]，[97， 97]，[97,17]，[17,58]]

目的是这些作为NetworkX中的边缘将它们分析为图形和标识ify密集运动（边缘）在各个集群（节点）之间。

这是我最初用来创建轨迹的代码：

 ＃导入数据
 data = pd.read_csv（'G：\编程项目\GSGS 681 \dmv_tweets_20170309_20170314_cluster_outputs.csv'，delimiter = '，'，engine ='python'）
 #print len（data），rows
 
＃创建数据Fame 
 df = pd.DataFrame（data，columns = ['user_id'，'timestamp'，'latitude'，'longitude'，'cluster_labels']）
 
＃按照user_id的数量过滤数据帧
 filtered = df.groupby（'user_id '）.filter（lambda x：x ['user_id'] .count（）> 1）
＃filtered.to_csv（'G：\ Program Programming Projects \GSGS 681\dmv_tweets_20170309_20170314_final_filtered.csv'，index = False，header = True）
 
＃获取唯一的user_id值列表
 uniqueIds = np.unique（filtered ['user_id']。values）
 
 ＃获取有序（按时间戳）坐标f或每个user_id 
 output = [[id，filtered.loc [filtered ['user_id'] == id] .sort_values（by ='timestamp'）[['cluster_labels']]。values.tolist（）] for id in uniqueIds] 
 
＃将输出保存为csv 
 outputs = pd.DataFrame（输出）
 #print输出
 headers = ['user_id'，'trajectory '] 
 outputs.to_csv（'G：\ Programmable Projects \GGS 681\dmv_tweets_20170309_20170314_cluster_moves.csv'，index = False，header = headers）

如果以这种方式分解是可能的，它可以在处理过程中完成，而不是事后？我想在创建时执行它，以消除任何后处理。

我认为你可以使用函数return all no NaN 值，如果通过 length 没有NaN更好地过滤，则 len 。
#filtering and sorting
filtered = df.groupby（'user_id'）。filter（lambda x ：len（x ['user_id']）> 1）
filtered = filtered.sort_values（by ='timestamp'）
$ bf = lambda x：[list（a）for a拉链（X [ - 1] ，（x）[1：]）]
df2 = filtered.groupby（'user_id'）['cluster_labels']。apply（f）.reset_index（）
print（df2）
user_id cluster_labels
0 11011 [[86,110]，[110,110]]
1 2139671 [[89,125]]
2 3945641 [[36,73]，[73,110] ，[110，110]]
3 10024312 [[123,27]，[27,97]，[97,97]，[97,97]，[97，...
4 14270422 [[0，110]，[110，174]]
5 14283758 [[110,184]]
6 14373703 [[35,97]，[97,97]，[97,97] ，[97，17]，[17，...

类似的解决方案，过滤是最后一步：
filtered = filtered.sort_values（by ='timestamp'）

f = lambda x：[list（a）for一个in zip（x [： - 1]，x [1：]）]
df2 = filtered.groupby（'user_id'）['cluster_labels'] .application（f）.reset_index（）
df2 = df2 [df2 ['cluster_labels']。str.len（）> 0]
print（df2）
user_id cluster_labels
1 11011 [[86,110]，[110,110]]
2 2139671 [[89,125]]
3 3945641 [[36,73]，[73,110]，[110,110]]
4 10024312 [[123,27]，[27,97]，[97,97]，[97 ，97]，[97，...
5 14270422 [[0,110]，[110,174]]
6 14283758 [[110,184]]
7 14373703 [[ 35,97]，[97,97]，[97,97]，[97,17]，[17，...

I have trajectories created from moves between clusters such as these:
user_id,trajectory 11011,[[[86], [110], [110]] 2139671,[[89], [125]] 3945641,[[36], [73], [110], [110]] 10024312,[[123], [27], [97], [97], [97], [110]] 14270422,[[0], [110], [174]] 14283758,[[110], [184]] 14317445,[[50], [88]] 14331818,[[0], [22], [36], [131], [131]] 14334591,[[107], [19]] 14373703,[[35], [97], [97], [97], [17], [58]]
I would like to split the trajectories with multiple moves into individual segments, but I am unsure how.
Example:
14373703,[[35], [97], [97], [97], [17], [58]]
into
14373703,[[35,97], [97,97], [97,17], [17,58]]
The purpose is to then use these as edges in NetworkX to analyse them as a graph and identify dense movements (edges) between the individual clusters (nodes).
This is the code I've used to create the trajectories initially:
# Import Data data = pd.read_csv('G:\Programming Projects\GGS 681\dmv_tweets_20170309_20170314_cluster_outputs.csv', delimiter=',', engine='python') #print len(data),"rows" # Create Data Fame df = pd.DataFrame(data, columns=['user_id','timestamp','latitude','longitude','cluster_labels']) # Filter Data Frame by count of user_id filtered = df.groupby('user_id').filter(lambda x: x['user_id'].count()>1) #filtered.to_csv('G:\Programming Projects\GGS 681\dmv_tweets_20170309_20170314_final_filtered.csv', index=False, header=True) # Get a list of unique user_id values uniqueIds = np.unique(filtered['user_id'].values) # Get the ordered (by timestamp) coordinates for each user_id output = [[id,filtered.loc[filtered['user_id']==id].sort_values(by='timestamp')[['cluster_labels']].values.tolist()] for id in uniqueIds] # Save outputs as csv outputs = pd.DataFrame(output) #print outputs headers = ['user_id','trajectory'] outputs.to_csv('G:\Programming Projects\GGS 681\dmv_tweets_20170309_20170314_cluster_moves.csv', index=False, header=headers)
If splitting this way is possible, can it be completed during the processing, as opposed to after the fact? I'd like to perform it while creating, to eliminate any postprocessing.
解决方案
I think you can use groupby with apply and custom function with zip, for output list of lists in necessary list comprehension:
Notice:
count function return all no NaN values, if filtering by length without NaN better is len.
#filtering and sorting filtered = df.groupby('user_id').filter(lambda x: len(x['user_id'])>1) filtered = filtered.sort_values(by='timestamp') f = lambda x: [list(a) for a in zip(x[:-1], x[1:])] df2 = filtered.groupby('user_id')['cluster_labels'].apply(f).reset_index() print (df2) user_id cluster_labels 0 11011 [[86, 110], [110, 110]] 1 2139671 [[89, 125]] 2 3945641 [[36, 73], [73, 110], [110, 110]] 3 10024312 [[123, 27], [27, 97], [97, 97], [97, 97], [97,... 4 14270422 [[0, 110], [110, 174]] 5 14283758 [[110, 184]] 6 14373703 [[35, 97], [97, 97], [97, 97], [97, 17], [17, ...
Similar solution, filtering is last step by boolean indexing:
filtered = filtered.sort_values(by='timestamp') f = lambda x: [list(a) for a in zip(x[:-1], x[1:])] df2 = filtered.groupby('user_id')['cluster_labels'].apply(f).reset_index() df2 = df2[df2['cluster_labels'].str.len() > 0] print (df2) user_id cluster_labels 1 11011 [[86, 110], [110, 110]] 2 2139671 [[89, 125]] 3 3945641 [[36, 73], [73, 110], [110, 110]] 4 10024312 [[123, 27], [27, 97], [97, 97], [97, 97], [97,... 5 14270422 [[0, 110], [110, 174]] 6 14283758 [[110, 184]] 7 14373703 [[35, 97], [97, 97], [97, 97], [97, 17], [17, ...

这篇关于Python：将轨迹分解成步骤的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！