问题描述
我正在使用data.table,并且试图创建一个名为季节的新列,该列基于名为 MonthName的列创建具有相应季节(例如,夏季,冬季...)的列。
I'm using data.table and I am trying to make a new column, called "season", which creates a column with the corresponding season, e.g summer, winter... based on a column called "MonthName".
我想知道是否有更有效的方法可以根据月份值将季节列添加到数据表中。
I'm wondering whether there is a more efficient way to add a season column to a data table based on month values.
这是300,000个观测值的前6个,假设该表被称为 dt。
This is the first 6 of 300,000 observations, assume that the table is called "dt".
rrp Year Month Finyear hourminute AvgPriceByTOD MonthName
1: 35.27500 1999 1 1999 00:00 33.09037 Jan
2: 21.01167 1999 1 1999 00:00 33.09037 Jan
3: 25.28667 1999 2 1999 00:00 33.09037 Feb
4: 18.42334 1999 2 1999 00:00 33.09037 Feb
5: 16.67499 1999 2 1999 00:00 33.09037 Feb
6: 18.90001 1999 2 1999 00:00 33.09037 Feb
我尝试了以下代码:
dt[, Season := ifelse(MonthName = c("Jun", "Jul", "Aug"),"Winter", ifelse(MonthName = c("Dec", "Jan", "Feb"), "Summer", ifelse(MonthName = c("Sep", "Oct", "Nov"), "Spring" , ifelse(MonthName = c("Mar", "Apr", "May"), "Autumn", NA))))]
哪个返回:
rrp totaldemand Year Month Finyear hourminute AvgPriceByTOD MonthName Season
1: 35.27500 1999 1 1999 00:00 33.09037 Jan NA
2: 21.01167 1999 1 1999 00:00 33.09037 Jan Summer
3: 25.28667 1999 2 1999 00:00 33.09037 Feb Summer
4: 18.42334 1999 2 1999 00:00 33.09037 Feb NA
5: 16.67499 1999 2 1999 00:00 33.09037 Feb NA
6: 18.90001 1999 2 1999 00:00 33.09037 Feb Summer
我知道了ror:
Warning messages:
1: In MonthName == c("Jun", "Jul", "Aug") :
longer object length is not a multiple of shorter object length
2: In MonthName == c("Dec", "Jan", "Feb") :
longer object length is not a multiple of shorter object length
3: In MonthName == c("Sep", "Oct", "Nov") :
longer object length is not a multiple of shorter object length
4: In MonthName == c("Mar", "Apr", "May") :
longer object length is not a multiple of shorter object length
此外,由于我不知道的原因,某些夏季月份被正确地分配为夏季,而其他月份则被分配为不适用,例如行1和2都应该是夏天,但是返回的方式不同。
ALongside this, for reasons that I don't know, some of the summer months are correctly assigned "summer", but others are assigned NA, e.g rows 1 and 2 should both be summer, but return differently.
预先感谢!
推荐答案
一种非常简单的方法是使用查找表将月份名称映射到季节:
One pretty straightforward way is to use a lookup table to map month names to seasons:
# create a named vector where names are the month names and elements are seasons
seasons <- rep(c("winter","spring","summer","fall"), each = 3)
names(seasons) <- month.abb[c(6:12,1:5)] # thanks thelatemail for pointing out month.abb
seasons
# Jun Jul Aug Sep Oct Nov Dec Jan
#"winter" "winter" "winter" "spring" "spring" "spring" "summer" "summer"
# Feb Mar Apr May
#"summer" "fall" "fall" "fall"
使用它:
dt[, season := seasons[MonthName]]
数据:
dt <- setDT(read.table(text=" rrp Year Month Finyear hourminute AvgPriceByTOD MonthName
1: 35.27500 1999 1 1999 00:00 33.09037 Jan
2: 21.01167 1999 1 1999 00:00 33.09037 Jan
3: 25.28667 1999 2 1999 00:00 33.09037 Feb
4: 18.42334 1999 2 1999 00:00 33.09037 Feb
5: 16.67499 1999 2 1999 00:00 33.09037 Feb
6: 18.90001 1999 2 1999 00:00 33.09037 Feb",
header = TRUE, stringsAsFactors = FALSE))
这篇关于根据月份日期将季节列添加到数据表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!