我有以下 R 数据框:

zed
# A tibble: 10 x 3
   jersey_number first_name statistics.minutes
   <chr>         <chr>      <chr>
 1 20            Marques    8:20
 2 53            Brennan    00:00
 3 35            Marvin     40:00
 4 50            Justin     00:00
 5 14            Jordan     00:00
 6 1             Trevon     31:00
 7 15            Alex       2:00
 8 51            Mike       00:00
 9 12            Javin      17:00
10 3             Grayson    38:00

> dput(zed)
structure(list(jersey_number = c("20", "53", "35", "50", "14",
"1", "15", "51", "12", "3"), first_name = c("Marques", "Brennan",
"Marvin", "Justin", "Jordan", "Trevon", "Alex", "Mike", "Javin",
"Grayson"), statistics.minutes = c("8:20", "00:00", "40:00",
"00:00", "00:00", "31:00", "2:00", "00:00", "17:00", "38:00")), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))

这是我从 API 接收数据的格式。所有列(大约有 100 列)最初都属于 character 类。要转换所有内容,我使用 readr::type_convert() ,但发生以下错误:
> zed %>% readr::type_convert()
Parsed with column specification:
cols(
  jersey_number = col_integer(),
  first_name = col_character(),
  statistics.minutes = col_time(format = "")
)
# A tibble: 10 x 3
   jersey_number first_name statistics.minutes
           <int> <chr>      <time>
 1            20 Marques    08:20
 2            53 Brennan    00:00
 3            35 Marvin        NA
 4            50 Justin     00:00
 5            14 Jordan     00:00
 6             1 Trevon        NA
 7            15 Alex       02:00
 8            51 Mike       00:00
 9            12 Javin      17:00
10             3 Grayson       NA

如果此分钟列改为类 == 数字,而不是抛出错误和搞乱转换,我希望它。如果该列的一行显示“8:20”,我希望将其简单地转换为 8.33。

关于如何做到这一点的任何想法 - 最好是允许我继续使用 type_convert 的东西。

最佳答案

library(lubridate)
读入 df 而不做任何改动(您的 dput 代码)。

将小时添加到分秒:

df$statistics.minutes <- paste0("00:", df$statistics.minutes)

转换为时间类型:
df$statistics.minutes <- lubridate::hms(df$statistics.minutes)

除以 60:
period_to_seconds(df$statistics.minutes) / 60

结果:
 [1]  8.333333  0.000000 40.000000  0.000000  0.000000
 [6] 31.000000  2.000000  0.000000 17.000000 38.000000

如果需要,替换 df :
df$statistics.minutes <- period_to_seconds(df$statistics.minutes) / 60

[ OP 的添加 ] :-)

我创建了以下辅助函数 - 基于这个结果 - 所以我可以在不破坏管道链的情况下解决问题:
fixMinutes <- function(raw.data) {

  new.raw.data <- raw.data %>%
    dplyr::mutate(statistics.minutes = paste0("00:", statistics.minutes)) %>%
    dplyr::mutate(statistics.minutes = lubridate::hms(statistics.minutes)) %>%
    dplyr::mutate(statistics.minutes = lubridate::period_to_seconds(statistics.minutes) / 60)

  return(new.raw.data)
}

zed %>%
  ... %>%
  fixMinutes() %>%
  ... %>%

关于readr::type_convert 弄乱了时间列,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/53982826/

10-11 17:49