r - 使用parse_date_time以dmy和dmY格式解析日期

我有一个日期字符表示的向量，格式主要是dmY(例如27-09-2013)，dmy(例如27-09-13)，偶尔还有b或B月。因此，包parse_date_time中的lubridate“允许用户指定几种格式顺序来处理不同的日期时间字符表示形式”对我来说可能是一个非常有用的功能。

但是，当parse_date_time日期与dmy日期一起出现时，似乎dmY在解析dmy日期时遇到了问题。当单独解析dmy或lubridate以及与我相关的其他格式时，它可以正常工作。 @Peyton的答案here的注释中也指出了这种模式。提出了一个快速解决方案，但我想问一问是否可以在dmy中处理它。

在这里，我展示了一些示例，其中我尝试将orders格式的日期与其他格式一起解析，并相应地指定select_formats。

library(lubridate)
# version: lubridate_1.3.0

# regarding how date format is specified in 'orders':
# examples in ?parse_date_time
# parse_date_time(x, "ymd")
# parse_date_time(x, "%y%m%d")
# parse_date_time(x, "%y %m %d")
# these order strings are equivalent and parses the same way
# "Formatting orders might include arbitrary separators. These are discarded"

# dmy date only
parse_date_time(x = "27-09-13", orders = "d m y")
# [1] "2013-09-27 UTC"
# OK

# dmy & dBY
parse_date_time(c("27-09-13", "27 September 2013"), orders = c("d m y", "d B Y"))
# [1] "2013-09-27 UTC" "2013-09-27 UTC"
# OK

# dmy & dbY
parse_date_time(c("27-09-13", "27 Sep 2013"), orders = c("d m y", "d b Y"))
# [1] "2013-09-27 UTC" "2013-09-27 UTC"
# OK

# dmy & dmY
parse_date_time(c("27-09-13", "27-09-2013"), orders = c("d m y", "d m Y"))
# [1] "0013-09-27 UTC" "2013-09-27 UTC"
# not OK

# does order of the date components matter?
parse_date_time(c("2013-09-27", "13-09-13"), orders = c("Y m d", "y m d"))
# [1] "2013-09-27 UTC" "0013-09-27 UTC"
# no

那select_formats参数呢？抱歉，我很抱歉，但是我很难理解帮助文件的这一部分。和search for dmy on SO:0结果。不过，本节似乎仍然很相关:“默认情况下，选择了具有大多数格式化 token (％)的格式，％Y计为2.5 token (因此它的优先级可能高于％y％m)。”。因此，我(拼命地)尝试了一些附加的guess_formats日期:

parse_date_time(c("27-09-2013", rep("27-09-13", 10)), orders = c("d m y", "d m Y"))
# not OK. Tried also 100 dmy dates.

# does order in the vector matter?
parse_date_time(c(rep("27-09-13", 10), "27-09-2013"), orders = c("d m y", "d m Y"))
# no

然后，我检查了lubridate函数(也在dmy中)如何与dmY一起处理?guess_formats:

guess_formats(c("27-09-13", "27-09-2013"), c("dmy", "dmY"), print_matches = TRUE)
#                   dmy        dmY
# [1,] "27-09-13"   "%d-%m-%y" ""
# [2,] "27-09-2013" "%d-%m-%Y" "%d-%m-%Y"
# OK

来自y also matches Y:?parse_date_time。来自y* Year without century (00–99 or 0–99). Also matches year with century (Y format):guess_format。所以我尝试了:

guess_formats(c("27-09-13", "27-09-2013"), c("dmy"), print_matches = TRUE)
#                   dmy
# [1,] "27-09-13"   "%d-%m-%y"
# [2,] "27-09-2013" "%d-%m-%Y"
# OK

因此，dmy似乎能够与dmY一起处理parse_date_time。但是，如何告诉lubridate进行相同的操作？在此先感谢您的任何评论或帮助。

更新
我在 ojit_code bug report上发布了问题，并从@vitoshka得到了快速答复:“这是一个错误”。

最佳答案

看起来像个 bug 。我不确定因此您应该联系维护者。

构建包源并在此内部函数中更改一行(我将which.max替换为wich.min):

.select_formats <-   function(trained){
  n_fmts <- nchar(gsub("[^%]", "", names(trained))) + grepl("%Y", names(trained))*1.5
  names(trained[ which.min(n_fmts) ]) ## replace which.max  by which.min
}

似乎可以解决问题。坦白地说，我不知道为什么会这样，但是我想这是一种排名。

parse_date_time(c("27-09-13", "27-09-2013"), orders = c("d m y", "d m Y"))
[1] "2013-09-27 UTC" "2013-09-27 UTC"

parse_date_time(c("2013-09-27", "13-09-13"), orders = c("Y m d", "y m d"))
[1] "2013-09-27 UTC" "2013-09-13 UTC"