问题描述
我有几年的数据,我试图将这些数据用于动物园对象 (Dropbox 上的 .csv).一旦数据被强制转换为动物园对象,我就会收到错误消息.我在索引中找不到任何重复项.
I have several years of data that I'm trying to work into a zoo object (.csv at Dropbox). I'm given an error once the data is coerced into a zoo object. I cannot find any duplicated in the index.
df <- read.csv(choose.files(default = "", caption = "Select data source", multi = FALSE), na.strings="*")
df <- read.zoo(df, format = "%Y/%m/%d %H:%M", regular = TRUE, row.names = FALSE, col.names = TRUE, index.column = 1)
Warning message:
In zoo(rval3, ix) :
some methods for "zoo" objects do not work if the index entries in ‘order.by’ are not unique
我试过了:
sum(duplicated(df$NST_DATI))
但结果是0.
感谢您的帮助!
推荐答案
您正在错误地使用 read.zoo(...)
.根据文档:
You are using read.zoo(...)
incorrectly. According to the documentation:
为了处理索引,read.zoo以索引为第一调用FUN争论.如果未指定 FUN 则如果有多个索引列粘贴在一起,每列之间有一个空格.使用索引列或粘贴的索引列: 1. 如果指定了 tz 则索引列转换为 POSIXct.2. 如果指定了格式,则索引列转换为日期.3. 否则,启发式尝试在数字"、日期"和POSIXct"之间做出决定.如果格式和/或 tz 被指定然后它们被传递给转换函数
您正在指定 format=...
所以 read.zoo(...)
将所有内容转换为日期,而不是 POSIXct.显然,有很多很多重复的日期.
You are specifying format=...
so read.zoo(...)
converts everything to Date, not POSIXct. Obviously, there are many, many duplicated dates.
简单来说,正确的解决方案是使用:
Simplistically, the correct solution is to use:
df <- read.zoo(df, FUN=as.POSIXct, format = "%Y/%m/%d %H:%M")
# Error in read.zoo(df, FUN = as.POSIXct, format = "%Y/%m/%d %H:%M") :
# index has bad entries at data rows: 507 9243 18147 26883 35619 44355
但是正如你所看到的,这也不起作用.这里的问题要微妙得多.索引使用 POSIXct
转换,但在系统时区(在我的系统上是美国东部).引用行的时间戳与从标准到 DST 的转换一致,因此这些时间在美国东部时区不存在.如果您使用:
but as you can see this does not work either. Here the problem is much more subtle. The index is converted using POSIXct
, but in the system time zone (which on my system is US Eastern). The referenced rows have timestamps that coincide with the changeover from Standard to DST, so these times do not exist in the US Eastern timezone. If you use:
df <- read.zoo(df, FUN=as.POSIXct, format = "%Y/%m/%d %H:%M", tz="UTC")
数据导入正确.
编辑:
正如@G.Grothendieck 指出的那样,这也行得通,而且更简单:
As @G.Grothendieck points out, this would also work, and is simpler:
df <- read.zoo(df, tz="UTC")
您应该将 tz
设置为适合数据集的任何时区.
You should set tz
to whatever timezome is appropriate for the dataset.
这篇关于在 zooreg 时间序列中无法找到非唯一索引条目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!