本文介绍了R:用dplyr以小时为单位替换NA值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在R中学习dplyr软件包,我真的很喜欢。但是现在我正在处理数据中的NA值。
I'm learning the dplyr package in R and I really like it. But now I'm dealing with NA values in my data.
我想用相应小时数的平均值代替任何NA,例如,举一个非常简单的例子:
I would like to replace any NA by the average of the corresponding hour, for example with this very easy example:
#create an example
day = c(1, 1, 2, 2, 3, 3)
hour = c(8, 16, 8, 16, 8, 16)
profit = c(100, 200, 50, 60, NA, NA)
shop.data = data.frame(day, hour, profit)
#calculate the average for each hour
library(dplyr)
mean.profit <- shop.data %>%
group_by(hour) %>%
summarize(mean=mean(profit, na.rm=TRUE))
> mean.profit
Source: local data frame [2 x 2]
hour mean
1 8 75
2 16 130
我可以使用dplyr transform命令将利润中第3天的NA替换为75(对于8:00)和130(对于16 :00)?
Can I use the dplyr transform command to replace the NA's of day 3 in the profit with 75 (for 8:00) and 130 (for 16:00)?
推荐答案
尝试
shop.data %>%
group_by(hour) %>%
mutate(profit= ifelse(is.na(profit), mean(profit, na.rm=TRUE), profit))
# day hour profit
#1 1 8 100
#2 1 16 200
#3 2 8 50
#4 2 16 60
#5 3 8 75
#6 3 16 130
或您可以使用 replace
shop.data %>%
group_by(hour) %>%
mutate(profit= replace(profit, is.na(profit), mean(profit, na.rm=TRUE)))
这篇关于R:用dplyr以小时为单位替换NA值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!