r - 使用 R 从 GTFS 数据创建 iGraph 图

我的目标是将 GTFS 停止和行程信息转换成一个图形，其中顶点是停止(来自 GTFS 的 stop.txt)和边是行程(来自 GTFS 的 stop_times.txt)。第一步很明显:

> library(igraph)

#Reading in GTFS files
> stops<-read.csv("stops.txt")
> stop_times<-read.csv("stop_times.txt")

我的第一直觉是简单地使用 iGraph 的 graph_from_data_frame 函数，但有一个严重的缺点:stop_times DF 并没有真正构建到所需的方案中。它的方案如下:

>head(stop_times)
  trip_id stop_id arrival_time departure_time stop_sequence shape_dist_traveled
1 A895151  F04272     06:20:00       06:20:00            10                   0
2 A895151  F04184     06:22:00       06:22:00            20                 648
3 A895151  F04319     06:24:00       06:24:00            30                1224
4 A895151  F04369     06:27:00       06:27:00            40                2779
5 A895151  008264     06:31:00       06:31:00            50                5620
6 A895151  F01520     06:33:00       06:33:00            60                6691

这意味着它包含在相应停靠点具有到达和离开时间的 stop_ids，而我想获得每行的 start_stop_id、end_stop_id、start_time、end_time(实际上，不是“停靠点”，而是从停靠点转换的“过境”)。但是这种转换对我来说似乎具有挑战性，因为我应该遍历 stop_times 中的行并决定它们是否在同一个 trip_id 中，如果是，则计算开始结束数据，如果不是，则插入 NULL 或找到另一个解决方案来分隔行程.. . 这让我很困惑。

有没有什么优雅的方法可以将所有这两个数据框组合成所需的图形？

最佳答案

'from' 和 'to' 可以通过将值从下一行“移动”到“当前”行来生成。并且可以简单地加入停止信息

举个例子说明一下，以及library(data.table)的使用

## here I"m using Melbourne's GTFS ("http://transitfeeds.com/p/ptv/497/latest/download")

#dt_stop_times <- lst[[6]]$stop_times
#dt_stops <- lst[[7]]$stops

#setDT(dt_stop_times)
#setDT(dt_stops)


## join on whatever stop information you want
dt_stop_times <- dt_stop_times[ dt_stops, on = c("stop_id"), nomatch = 0]

## set the order of stops for each group (in this case, each group is a trip_id)
setorder(dt_stop_times, trip_id, stop_sequence)

## create a new column by shifting the stop_id of the following row up
dt_stop_times[, stop_id_to := shift(stop_id, type = "lead"), by = .(trip_id)]

## you will have NAs at this point because the last stop doesn't go anywhere.

## you can do the same operation on multiple columns at the same time
dt_stop_times[, `:=`(stop_id_to = shift(stop_id, type = "lead"),
                     arrival_time_stop_to = shift(arrival_time, type = "lead"),
                     departure_time_stop_to = shift(departure_time, type = "lead")),
              by = .(trip_id)]

## now you have your 'from' and 'to' columns from which you can make your igraph

## here's a subset of the result
dt_stop_times[, .(trip_id, stop_id, stop_name_from = stop_name, arrival_time, stop_id_to, arrival_time_stop_to)]

#                           trip_id stop_id                                                  stop_name_from arrival_time stop_id_to
# 1:          1.T0.3-86-A-mjp-1.7.R    4174                                    71-RMIT/Plenty Rd (Bundoora)     25:42:00       4485
# 2:          1.T0.3-86-A-mjp-1.7.R    4485                            70-Janefield Dr/Plenty Rd (Bundoora)     25:43:00       4486
# 3:          1.T0.3-86-A-mjp-1.7.R    4486                              69-Taunton Dr/Plenty Rd (Bundoora)     25:44:00       4487
# 4:          1.T0.3-86-A-mjp-1.7.R    4487                           68-Greenhills Rd/Plenty Rd (Bundoora)     25:45:00       4488
# 5:          1.T0.3-86-A-mjp-1.7.R    4488                      67-Bundoora Square SC/Plenty Rd (Bundoora)     25:46:00       4489
# ---
# 9415793: 9999.UQ.3-19-E-mjp-1.1.H   17871           7-Queen Victoria Market/Elizabeth St (Melbourne City)     23:25:00      17873
# 9415794: 9999.UQ.3-19-E-mjp-1.1.H   17873       5-Melbourne Central Station/Elizabeth St (Melbourne City)     23:27:00      17875
# 9415795: 9999.UQ.3-19-E-mjp-1.1.H   17875              3-Bourke Street Mall/Elizabeth St (Melbourne City)     23:30:00      17876
# 9415796: 9999.UQ.3-19-E-mjp-1.1.H   17876                      2-Collins St/Elizabeth St (Melbourne City)     23:31:00      17877
# 9415797: 9999.UQ.3-19-E-mjp-1.1.H   17877 1-Flinders Street Railway Station/Elizabeth St (Melbourne City)     23:32:00         NA
#          arrival_time_stop_to
# 1:                   25:43:00
# 2:                   25:44:00
# 3:                   25:45:00
# 4:                   25:46:00
# 5:                   25:47:00
# ---
# 9415793:             23:27:00
# 9415794:             23:30:00
# 9415795:             23:31:00
# 9415796:             23:32:00
# 9415797:                   NA

现在，要使用 graph_from_data_frame{igraph}，您只需要:

# get a df with nodes
  nodes <- dt_stops[, .(stop_id, stop_lon, stop_lat)]

# links beetween stops
  links <- dt_stop_times[,.(stop_id, stop_id_to, trip_id)]

# create graph
  g <- graph_from_data_frame(links , directed=TRUE, vertices=nodes)

但是请注意，在 GTFS.zip 文件中，您可能有多种交通方式(火车、公共(public)汽车、地铁等)，并且由于服务频率的变化，某些站点对的连通性比其他站点高得多。我还不清楚在从 GTFS.zip 构建图形时应该如何考虑这两点。可能的前进方向是根据每个边缘的频率对每个边缘进行加权，并构建一个多层网络，在每个传输模式中都有一些共同的停靠点被视为一个相互依赖的层。

关于r - 使用 R 从 GTFS 数据创建 iGraph 图，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/41097918/