本文介绍了使用parse_date_time以dmy和dmY格式解析日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个日期字符表示向量,其中格式主要是dmY(例如27-09-2013),dmy(例如27-09-13),偶尔是一些b或个月.因此,软件包lubridate中的parse_date_time允许用户指定几种格式顺序以处理异构的日期时间字符表示形式"对我来说可能是一个非常有用的功能.

I have a vector of character representation of dates, where formats mostly are dmY (e.g. 27-09-2013), dmy (e.g. 27-09-13), and occasionally some b or B months. Thus, parse_date_time in package lubridate that "allows the user to specify several format-orders to handle heterogeneous date-time character representations" could be a very useful function for me.

但是,当parse_date_time日期与dmY日期一起出现时,似乎parse_date_time在解析dmy日期时遇到问题.当单独解析dmydmy以及与我相关的其他格式时,它可以正常工作.在@Peyton的答案这里.建议进行快速修复,但是我想问一下是否可以在lubridate中处理它.

However, it seems that parse_date_time has problem parsing dmy dates when they occur together with dmY dates. When parsing dmy alone, or dmy together with some other formats relevant to me, it works fine. This pattern was also noted in a comment to @Peyton's answer here. A quick fix was suggested, but I wish to ask if it is possible to handle it in lubridate.

这里显示了一些示例,其中我尝试将dmy格式的日期与其他一些格式一起解析,并相应地指定orders.

Here I show some examples where I try to parse dates on dmy format together with some other formats, and specifying orders accordingly.

library(lubridate)
# version: lubridate_1.3.0

# regarding how date format is specified in 'orders':
# examples in ?parse_date_time
# parse_date_time(x, "ymd")
# parse_date_time(x, "%y%m%d")
# parse_date_time(x, "%y %m %d")
# these order strings are equivalent and parses the same way
# "Formatting orders might include arbitrary separators. These are discarded"

# dmy date only
parse_date_time(x = "27-09-13", orders = "d m y")
# [1] "2013-09-27 UTC"
# OK

# dmy & dBY
parse_date_time(c("27-09-13", "27 September 2013"), orders = c("d m y", "d B Y"))
# [1] "2013-09-27 UTC" "2013-09-27 UTC"
# OK

# dmy & dbY
parse_date_time(c("27-09-13", "27 Sep 2013"), orders = c("d m y", "d b Y"))
# [1] "2013-09-27 UTC" "2013-09-27 UTC"
# OK

# dmy & dmY
parse_date_time(c("27-09-13", "27-09-2013"), orders = c("d m y", "d m Y"))
# [1] "0013-09-27 UTC" "2013-09-27 UTC"
# not OK

# does order of the date components matter?
parse_date_time(c("2013-09-27", "13-09-13"), orders = c("Y m d", "y m d"))
# [1] "2013-09-27 UTC" "0013-09-27 UTC"
# no

select_formats参数如何?抱歉,我很抱歉,但是我很难理解帮助文件的这一部分.然后在SO上搜索select_formats :0个结果.不过,本节似乎仍然很相关:默认情况下,选择了具有大多数格式化令牌(%)的格式,%Y计为2.5令牌(因此它的优先级可能高于%y%m).".因此,我(拼命)尝试了一些其他dmy日期:

What about the select_formats argument? I am sorry to say this, but I have a hard time understand this section of the help file. And a search for select_formats on SO: 0 results. Still, this section seemed relevant: "By default the formats with most formating tockens (%) are selected and %Y counts as 2.5 tockens (so that it can have priority over %y%m).". So I (desperately) tried with some additional dmy dates:

parse_date_time(c("27-09-2013", rep("27-09-13", 10)), orders = c("d m y", "d m Y"))
# not OK. Tried also 100 dmy dates.

# does order in the vector matter?
parse_date_time(c(rep("27-09-13", 10), "27-09-2013"), orders = c("d m y", "d m Y"))
# no

然后,我检查了guess_formats函数(也在lubridate中)如何与dmY一起处理dmy:

I then checked how the guess_formats function (also in lubridate) handled dmy together with dmY:

guess_formats(c("27-09-13", "27-09-2013"), c("dmy", "dmY"), print_matches = TRUE)
#                   dmy        dmY
# [1,] "27-09-13"   "%d-%m-%y" ""
# [2,] "27-09-2013" "%d-%m-%Y" "%d-%m-%Y"
# OK

来自?guess_formats:y also matches Y.从?parse_date_time:y* Year without century (00–99 or 0–99). Also matches year with century (Y format).所以我尝试了:

From ?guess_formats: y also matches Y. From ?parse_date_time: y* Year without century (00–99 or 0–99). Also matches year with century (Y format). So I tried:

guess_formats(c("27-09-13", "27-09-2013"), c("dmy"), print_matches = TRUE)
#                   dmy
# [1,] "27-09-13"   "%d-%m-%y"
# [2,] "27-09-2013" "%d-%m-%Y"
# OK

因此,guess_format似乎能够与dmY一起处理dmy.但是如何告诉parse_date_time做同样的事情?预先感谢您的任何评论或帮助.

Thus, guess_format seems to be able to deal with dmy together with dmY. But how can I tell parse_date_time to do the same? Thanks in advance for any comments or help.

更新我在 lubridate错误报告上发布了问题,并迅速得到了答复. @vitoshka:这是一个错误".

UpdateI posted the question on the lubridate bug report, and got a rapid reply from @vitoshka: "This is a bug".

推荐答案

它看起来像个错误.我不确定,所以您应该联系维护者.

It looks like a bug. I am not sure So you should contact the maintainer.

构建包源并在此内部函数中更改一行(我将which.max替换为wich.min):

Building the package source and changing one line in this internal function ( I replace which.max by wich.min):

.select_formats <-   function(trained){
  n_fmts <- nchar(gsub("[^%]", "", names(trained))) + grepl("%Y", names(trained))*1.5
  names(trained[ which.min(n_fmts) ]) ## replace which.max  by which.min
}

似乎可以纠正问题.坦白地说,我不知道为什么会这样,但是我想这是一种排名..

seems to correct the problem. Frankly I don't know why this works, but I guess it is a kind of ranking..

parse_date_time(c("27-09-13", "27-09-2013"), orders = c("d m y", "d m Y"))
[1] "2013-09-27 UTC" "2013-09-27 UTC"

parse_date_time(c("2013-09-27", "13-09-13"), orders = c("Y m d", "y m d"))
[1] "2013-09-27 UTC" "2013-09-13 UTC"

这篇关于使用parse_date_time以dmy和dmY格式解析日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-08 10:07