问题描述
我有一个日期字符表示向量,其中格式主要是dmY
(例如27-09-2013),dmy
(例如27-09-13),偶尔是一些b
或个月.因此,软件包lubridate
中的parse_date_time
允许用户指定几种格式顺序以处理异构的日期时间字符表示形式"对我来说可能是一个非常有用的功能.
I have a vector of character representation of dates, where formats mostly are dmY
(e.g. 27-09-2013), dmy
(e.g. 27-09-13), and occasionally some b
or B
months. Thus, parse_date_time
in package lubridate
that "allows the user to specify several format-orders to handle heterogeneous date-time character representations" could be a very useful function for me.
但是,当parse_date_time
日期与dmY
日期一起出现时,似乎parse_date_time
在解析dmy
日期时遇到问题.当单独解析dmy
或dmy
以及与我相关的其他格式时,它可以正常工作.在@Peyton的答案这里.建议进行快速修复,但是我想问一下是否可以在lubridate
中处理它.
However, it seems that parse_date_time
has problem parsing dmy
dates when they occur together with dmY
dates. When parsing dmy
alone, or dmy
together with some other formats relevant to me, it works fine. This pattern was also noted in a comment to @Peyton's answer here. A quick fix was suggested, but I wish to ask if it is possible to handle it in lubridate
.
这里显示了一些示例,其中我尝试将dmy
格式的日期与其他一些格式一起解析,并相应地指定orders
.
Here I show some examples where I try to parse dates on dmy
format together with some other formats, and specifying orders
accordingly.
library(lubridate)
# version: lubridate_1.3.0
# regarding how date format is specified in 'orders':
# examples in ?parse_date_time
# parse_date_time(x, "ymd")
# parse_date_time(x, "%y%m%d")
# parse_date_time(x, "%y %m %d")
# these order strings are equivalent and parses the same way
# "Formatting orders might include arbitrary separators. These are discarded"
# dmy date only
parse_date_time(x = "27-09-13", orders = "d m y")
# [1] "2013-09-27 UTC"
# OK
# dmy & dBY
parse_date_time(c("27-09-13", "27 September 2013"), orders = c("d m y", "d B Y"))
# [1] "2013-09-27 UTC" "2013-09-27 UTC"
# OK
# dmy & dbY
parse_date_time(c("27-09-13", "27 Sep 2013"), orders = c("d m y", "d b Y"))
# [1] "2013-09-27 UTC" "2013-09-27 UTC"
# OK
# dmy & dmY
parse_date_time(c("27-09-13", "27-09-2013"), orders = c("d m y", "d m Y"))
# [1] "0013-09-27 UTC" "2013-09-27 UTC"
# not OK
# does order of the date components matter?
parse_date_time(c("2013-09-27", "13-09-13"), orders = c("Y m d", "y m d"))
# [1] "2013-09-27 UTC" "0013-09-27 UTC"
# no
select_formats
参数如何?抱歉,我很抱歉,但是我很难理解帮助文件的这一部分.然后在SO上搜索select_formats
:0个结果.不过,本节似乎仍然很相关:默认情况下,选择了具有大多数格式化令牌(%)的格式,%Y计为2.5令牌(因此它的优先级可能高于%y%m).".因此,我(拼命)尝试了一些其他dmy
日期:
What about the select_formats
argument? I am sorry to say this, but I have a hard time understand this section of the help file. And a search for select_formats
on SO: 0 results. Still, this section seemed relevant: "By default the formats with most formating tockens (%) are selected and %Y counts as 2.5 tockens (so that it can have priority over %y%m).". So I (desperately) tried with some additional dmy
dates:
parse_date_time(c("27-09-2013", rep("27-09-13", 10)), orders = c("d m y", "d m Y"))
# not OK. Tried also 100 dmy dates.
# does order in the vector matter?
parse_date_time(c(rep("27-09-13", 10), "27-09-2013"), orders = c("d m y", "d m Y"))
# no
然后,我检查了guess_formats
函数(也在lubridate
中)如何与dmY
一起处理dmy
:
I then checked how the guess_formats
function (also in lubridate
) handled dmy
together with dmY
:
guess_formats(c("27-09-13", "27-09-2013"), c("dmy", "dmY"), print_matches = TRUE)
# dmy dmY
# [1,] "27-09-13" "%d-%m-%y" ""
# [2,] "27-09-2013" "%d-%m-%Y" "%d-%m-%Y"
# OK
来自?guess_formats
:y also matches Y
.从?parse_date_time
:y* Year without century (00–99 or 0–99). Also matches year with century (Y format)
.所以我尝试了:
From ?guess_formats
: y also matches Y
. From ?parse_date_time
: y* Year without century (00–99 or 0–99). Also matches year with century (Y format)
. So I tried:
guess_formats(c("27-09-13", "27-09-2013"), c("dmy"), print_matches = TRUE)
# dmy
# [1,] "27-09-13" "%d-%m-%y"
# [2,] "27-09-2013" "%d-%m-%Y"
# OK
因此,guess_format
似乎能够与dmY
一起处理dmy
.但是如何告诉parse_date_time
做同样的事情?预先感谢您的任何评论或帮助.
Thus, guess_format
seems to be able to deal with dmy
together with dmY
. But how can I tell parse_date_time
to do the same? Thanks in advance for any comments or help.
更新我在 lubridate
错误报告上发布了问题,并迅速得到了答复. @vitoshka:这是一个错误".
UpdateI posted the question on the lubridate
bug report, and got a rapid reply from @vitoshka: "This is a bug".
推荐答案
它看起来像个错误.我不确定,所以您应该联系维护者.
It looks like a bug. I am not sure So you should contact the maintainer.
构建包源并在此内部函数中更改一行(我将which.max
替换为wich.min
):
Building the package source and changing one line in this internal function ( I replace which.max
by wich.min
):
.select_formats <- function(trained){
n_fmts <- nchar(gsub("[^%]", "", names(trained))) + grepl("%Y", names(trained))*1.5
names(trained[ which.min(n_fmts) ]) ## replace which.max by which.min
}
似乎可以纠正问题.坦白地说,我不知道为什么会这样,但是我想这是一种排名..
seems to correct the problem. Frankly I don't know why this works, but I guess it is a kind of ranking..
parse_date_time(c("27-09-13", "27-09-2013"), orders = c("d m y", "d m Y"))
[1] "2013-09-27 UTC" "2013-09-27 UTC"
parse_date_time(c("2013-09-27", "13-09-13"), orders = c("Y m d", "y m d"))
[1] "2013-09-27 UTC" "2013-09-13 UTC"
这篇关于使用parse_date_time以dmy和dmY格式解析日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!