我的目标是将 GTFS 停止和行程信息转换成一个图形,其中顶点是停止(来自 GTFS 的 stop.txt)和边是行程(来自 GTFS 的 stop_times.txt)。第一步很明显:
> library(igraph)
#Reading in GTFS files
> stops<-read.csv("stops.txt")
> stop_times<-read.csv("stop_times.txt")
我的第一直觉是简单地使用 iGraph 的
graph_from_data_frame
函数,但有一个严重的缺点:stop_times DF 并没有真正构建到所需的方案中。它的方案如下:>head(stop_times)
trip_id stop_id arrival_time departure_time stop_sequence shape_dist_traveled
1 A895151 F04272 06:20:00 06:20:00 10 0
2 A895151 F04184 06:22:00 06:22:00 20 648
3 A895151 F04319 06:24:00 06:24:00 30 1224
4 A895151 F04369 06:27:00 06:27:00 40 2779
5 A895151 008264 06:31:00 06:31:00 50 5620
6 A895151 F01520 06:33:00 06:33:00 60 6691
这意味着它包含在相应停靠点具有到达和离开时间的 stop_ids,而我想获得每行的 start_stop_id、end_stop_id、start_time、end_time(实际上,不是“停靠点”,而是从停靠点转换的“过境”)。但是这种转换对我来说似乎具有挑战性,因为我应该遍历 stop_times 中的行并决定它们是否在同一个 trip_id 中,如果是,则计算开始结束数据,如果不是,则插入 NULL 或找到另一个解决方案来分隔行程.. . 这让我很困惑。
有没有什么优雅的方法可以将所有这两个数据框组合成所需的图形?
最佳答案
'from' 和 'to' 可以通过将值从下一行“移动”到“当前”行来生成。并且可以简单地加入停止信息
举个例子说明一下,以及library(data.table)
的使用
## here I"m using Melbourne's GTFS ("http://transitfeeds.com/p/ptv/497/latest/download")
#dt_stop_times <- lst[[6]]$stop_times
#dt_stops <- lst[[7]]$stops
#setDT(dt_stop_times)
#setDT(dt_stops)
## join on whatever stop information you want
dt_stop_times <- dt_stop_times[ dt_stops, on = c("stop_id"), nomatch = 0]
## set the order of stops for each group (in this case, each group is a trip_id)
setorder(dt_stop_times, trip_id, stop_sequence)
## create a new column by shifting the stop_id of the following row up
dt_stop_times[, stop_id_to := shift(stop_id, type = "lead"), by = .(trip_id)]
## you will have NAs at this point because the last stop doesn't go anywhere.
## you can do the same operation on multiple columns at the same time
dt_stop_times[, `:=`(stop_id_to = shift(stop_id, type = "lead"),
arrival_time_stop_to = shift(arrival_time, type = "lead"),
departure_time_stop_to = shift(departure_time, type = "lead")),
by = .(trip_id)]
## now you have your 'from' and 'to' columns from which you can make your igraph
## here's a subset of the result
dt_stop_times[, .(trip_id, stop_id, stop_name_from = stop_name, arrival_time, stop_id_to, arrival_time_stop_to)]
# trip_id stop_id stop_name_from arrival_time stop_id_to
# 1: 1.T0.3-86-A-mjp-1.7.R 4174 71-RMIT/Plenty Rd (Bundoora) 25:42:00 4485
# 2: 1.T0.3-86-A-mjp-1.7.R 4485 70-Janefield Dr/Plenty Rd (Bundoora) 25:43:00 4486
# 3: 1.T0.3-86-A-mjp-1.7.R 4486 69-Taunton Dr/Plenty Rd (Bundoora) 25:44:00 4487
# 4: 1.T0.3-86-A-mjp-1.7.R 4487 68-Greenhills Rd/Plenty Rd (Bundoora) 25:45:00 4488
# 5: 1.T0.3-86-A-mjp-1.7.R 4488 67-Bundoora Square SC/Plenty Rd (Bundoora) 25:46:00 4489
# ---
# 9415793: 9999.UQ.3-19-E-mjp-1.1.H 17871 7-Queen Victoria Market/Elizabeth St (Melbourne City) 23:25:00 17873
# 9415794: 9999.UQ.3-19-E-mjp-1.1.H 17873 5-Melbourne Central Station/Elizabeth St (Melbourne City) 23:27:00 17875
# 9415795: 9999.UQ.3-19-E-mjp-1.1.H 17875 3-Bourke Street Mall/Elizabeth St (Melbourne City) 23:30:00 17876
# 9415796: 9999.UQ.3-19-E-mjp-1.1.H 17876 2-Collins St/Elizabeth St (Melbourne City) 23:31:00 17877
# 9415797: 9999.UQ.3-19-E-mjp-1.1.H 17877 1-Flinders Street Railway Station/Elizabeth St (Melbourne City) 23:32:00 NA
# arrival_time_stop_to
# 1: 25:43:00
# 2: 25:44:00
# 3: 25:45:00
# 4: 25:46:00
# 5: 25:47:00
# ---
# 9415793: 23:27:00
# 9415794: 23:30:00
# 9415795: 23:31:00
# 9415796: 23:32:00
# 9415797: NA
现在,要使用
graph_from_data_frame{igraph}
,您只需要:# get a df with nodes
nodes <- dt_stops[, .(stop_id, stop_lon, stop_lat)]
# links beetween stops
links <- dt_stop_times[,.(stop_id, stop_id_to, trip_id)]
# create graph
g <- graph_from_data_frame(links , directed=TRUE, vertices=nodes)
但是请注意,在
GTFS.zip
文件中,您可能有多种交通方式(火车、公共(public)汽车、地铁等),并且由于服务频率的变化,某些站点对的连通性比其他站点高得多。我还不清楚在从 GTFS.zip
构建图形时应该如何考虑这两点。可能的前进方向是根据每个边缘的频率对每个边缘进行加权,并构建一个多层网络,在每个传输模式中都有一些共同的停靠点被视为一个相互依赖的层。关于r - 使用 R 从 GTFS 数据创建 iGraph 图,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/41097918/