问题描述
user_id,轨迹
11011,[[ [86],[110],[110]]
2139671,[[89],[125]]
3945641,[[36],[73],[110],[110]]
10024312,[[123],[27],[97],[97],[97],[110]]
14270422,[[0],[110],[174]]
14283758,[[110],[184]]
14317445,[[50],[88]]
14331818,[[0],[22],[36],[ ,[[131],[131]]
14334591,[[107],[19]]
14373703,[[35],[97],[97],[97],[ 58]]
我想将多个移动轨迹分割成单独的分段,但我不确定如何。
示例:
14373703,[[35],[97],[97],[97],[17],[58]]
$
$ p $ 14373703,[[35,97],[97, 97],[97,17],[17,58]]
目的是这些作为NetworkX中的边缘将它们分析为图形和标识ify密集运动(边缘)在各个集群(节点)之间。
这是我最初用来创建轨迹的代码:
#导入数据
data = pd.read_csv('G:\编程项目\GSGS 681 \dmv_tweets_20170309_20170314_cluster_outputs.csv',delimiter = ',',engine ='python')
#print len(data),rows
#创建数据Fame
df = pd.DataFrame(data,columns = ['user_id','timestamp','latitude','longitude','cluster_labels'])
#按照user_id的数量过滤数据帧
filtered = df.groupby('user_id ').filter(lambda x:x ['user_id'] .count()> 1)
#filtered.to_csv('G:\ Program Programming Projects \GSGS 681\dmv_tweets_20170309_20170314_final_filtered.csv',index = False,header = True)
#获取唯一的user_id值列表
uniqueIds = np.unique(filtered ['user_id']。values)
#获取有序(按时间戳)坐标f或每个user_id
output = [[id,filtered.loc [filtered ['user_id'] == id] .sort_values(by ='timestamp')[['cluster_labels']]。values.tolist()] for id in uniqueIds]
#将输出保存为csv
outputs = pd.DataFrame(输出)
#print输出
headers = ['user_id','trajectory ']
outputs.to_csv('G:\ Programmable Projects \GGS 681\dmv_tweets_20170309_20170314_cluster_moves.csv',index = False,header = headers)
如果以这种方式分解是可能的,它可以在处理过程中完成,而不是事后?我想在创建时执行它,以消除任何后处理。
我认为你可以使用函数return all no NaN 值,如果通过 length 没有NaN更好地过滤,则 len 。#filtering and sorting
filtered = df.groupby('user_id')。filter(lambda x :len(x ['user_id'])> 1)
filtered = filtered.sort_values(by ='timestamp')
$ bf = lambda x:[list(a)for a拉链(X [ - 1] ,(x)[1:])]
df2 = filtered.groupby('user_id')['cluster_labels']。apply(f).reset_index()
print(df2)
user_id cluster_labels
0 11011 [[86,110],[110,110]]
1 2139671 [[89,125]]
2 3945641 [[36,73],[73,110] ,[110,110]]
3 10024312 [[123,27],[27,97],[97,97],[97,97],[97,...
4 14270422 [[0,110],[110,174]]
5 14283758 [[110,184]]
6 14373703 [[35,97],[97,97],[97,97] ,[97,17],[17,...
类似的解决方案,过滤是最后一步:
filtered = filtered.sort_values(by ='timestamp')
f = lambda x:[list(a)for一个in zip(x [: - 1],x [1:])]
df2 = filtered.groupby('user_id')['cluster_labels'] .application(f).reset_index()
df2 = df2 [df2 ['cluster_labels']。str.len()> 0]
print(df2)
user_id cluster_labels
1 11011 [[86,110],[110,110]]
2 2139671 [[89,125]]
3 3945641 [[36,73],[73,110],[110,110]]
4 10024312 [[123,27],[27,97],[97,97],[97 ,97],[97,...
5 14270422 [[0,110],[110,174]]
6 14283758 [[110,184]]
7 14373703 [[ 35,97],[97,97],[97,97],[97,17],[17,...
I have trajectories created from moves between clusters such as these:
user_id,trajectory 11011,[[[86], [110], [110]] 2139671,[[89], [125]] 3945641,[[36], [73], [110], [110]] 10024312,[[123], [27], [97], [97], [97], [110]] 14270422,[[0], [110], [174]] 14283758,[[110], [184]] 14317445,[[50], [88]] 14331818,[[0], [22], [36], [131], [131]] 14334591,[[107], [19]] 14373703,[[35], [97], [97], [97], [17], [58]]
I would like to split the trajectories with multiple moves into individual segments, but I am unsure how.
Example:
14373703,[[35], [97], [97], [97], [17], [58]]
into
14373703,[[35,97], [97,97], [97,17], [17,58]]
The purpose is to then use these as edges in NetworkX to analyse them as a graph and identify dense movements (edges) between the individual clusters (nodes).
This is the code I've used to create the trajectories initially:
# Import Data data = pd.read_csv('G:\Programming Projects\GGS 681\dmv_tweets_20170309_20170314_cluster_outputs.csv', delimiter=',', engine='python') #print len(data),"rows" # Create Data Fame df = pd.DataFrame(data, columns=['user_id','timestamp','latitude','longitude','cluster_labels']) # Filter Data Frame by count of user_id filtered = df.groupby('user_id').filter(lambda x: x['user_id'].count()>1) #filtered.to_csv('G:\Programming Projects\GGS 681\dmv_tweets_20170309_20170314_final_filtered.csv', index=False, header=True) # Get a list of unique user_id values uniqueIds = np.unique(filtered['user_id'].values) # Get the ordered (by timestamp) coordinates for each user_id output = [[id,filtered.loc[filtered['user_id']==id].sort_values(by='timestamp')[['cluster_labels']].values.tolist()] for id in uniqueIds] # Save outputs as csv outputs = pd.DataFrame(output) #print outputs headers = ['user_id','trajectory'] outputs.to_csv('G:\Programming Projects\GGS 681\dmv_tweets_20170309_20170314_cluster_moves.csv', index=False, header=headers)
If splitting this way is possible, can it be completed during the processing, as opposed to after the fact? I'd like to perform it while creating, to eliminate any postprocessing.
I think you can use groupby with apply and custom function with zip, for output list of lists in necessary list comprehension:
Notice:
count function return all no NaN values, if filtering by length without NaN better is len.
#filtering and sorting filtered = df.groupby('user_id').filter(lambda x: len(x['user_id'])>1) filtered = filtered.sort_values(by='timestamp') f = lambda x: [list(a) for a in zip(x[:-1], x[1:])] df2 = filtered.groupby('user_id')['cluster_labels'].apply(f).reset_index() print (df2) user_id cluster_labels 0 11011 [[86, 110], [110, 110]] 1 2139671 [[89, 125]] 2 3945641 [[36, 73], [73, 110], [110, 110]] 3 10024312 [[123, 27], [27, 97], [97, 97], [97, 97], [97,... 4 14270422 [[0, 110], [110, 174]] 5 14283758 [[110, 184]] 6 14373703 [[35, 97], [97, 97], [97, 97], [97, 17], [17, ...
Similar solution, filtering is last step by boolean indexing:
filtered = filtered.sort_values(by='timestamp') f = lambda x: [list(a) for a in zip(x[:-1], x[1:])] df2 = filtered.groupby('user_id')['cluster_labels'].apply(f).reset_index() df2 = df2[df2['cluster_labels'].str.len() > 0] print (df2) user_id cluster_labels 1 11011 [[86, 110], [110, 110]] 2 2139671 [[89, 125]] 3 3945641 [[36, 73], [73, 110], [110, 110]] 4 10024312 [[123, 27], [27, 97], [97, 97], [97, 97], [97,... 5 14270422 [[0, 110], [110, 174]] 6 14283758 [[110, 184]] 7 14373703 [[35, 97], [97, 97], [97, 97], [97, 17], [17, ...
这篇关于Python:将轨迹分解成步骤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!