问题描述
下面是我的场景.
场景
我有两个数据框.第一个数据框包含有关系统使用情况的数据,另一个数据框包含有关系统位置的数据.我想根据系统的使用日期以及仪器所处的位置来跟踪仪器的使用情况.为此,我正在使用dplyr
库在数据帧上执行外部联接.接下来,我想基于日期获取系统的频率计数.为此,我在系统和位置上使用groupby
.如果未使用该系统,则该系统的频率计数应为0.但是,当我查看位于位置3的系统6时,由于该仪器未使用(没有日期,假设未使用),该系统的频率计数应为0,因为日期"或用户"列不包含任何数据.但是,下面的代码返回的频率计数为1.我不确定这可能是错误的.下面是当前和预期的输出.
I have two dataframe. 1st dataframe contains data about system usage and another dataframe contains data about System location. I would like to track instrument usage based on date the system was used and also the location where the instrument is located. For this I am performing outer join on dataframes using dplyr
library. Next, I would like to get frequency count of the systems based on date. For this I am using groupby
on System and Locations. If the system is not in use the frequency count for that system should be 0.However, when I look at System 6, which is at location 3. Since, the instrument is not in use(No Date~assume not in use), the frequency count for that system should be 0, because Date or User column does not contain any data. However, below code is returning frequency count of 1. I am not sure, what could be wrong.Below is current and expected output.
提供解释并提供代码.
数据框1:
df <- data.frame("Users" =c('A',"B","A",'C','B'), "Date" = c('17-03-2019','15-03-2019','11-03-2019','20-04-2019',"21-04-2019"), "Systems" = c("Sys1", "Sys1","Sys2","Sys3","Sys4"), stringsAsFactors = FALSE)
df
Users Date Systems
1 A 17-03-2019 Sys1
2 B 15-03-2019 Sys1
3 A 11-03-2019 Sys2
4 C 20-04-2019 Sys3
5 B 21-04-2019 Sys4
数据框2
loc_df<-data.frame("Locations" =c('loc1','loc1','loc2','loc2','loc3'),"Systems" = c("Sys1","Sys2","Sys3","Sys4","Sys6"), stringsAsFactors = FALSE)
loc_df
Locations Systems
1 loc1 Sys1
2 loc1 Sys2
3 loc2 Sys3
4 loc2 Sys4
5 loc3 Sys6
频率计数代码
#Merging df
merge_df<-join(df, loc_df,type = "full")
#Replcaing NA's with 0
merge_df[is.na(merge_df)] <- 0
merge_df
#Code for frequency count
merge_df %>%
group_by(Systems,Locations)%>%
summarise(frequency = n())
当前输出:
Systems Locations frequency
<chr> <chr> <int>
1 Sys1 loc1 2
2 Sys2 loc1 1
3 Sys3 loc2 1
4 Sys4 loc2 1
5 Sys6 loc3 1
预期产量
Systems Locations frequency
<chr> <chr> <int>
1 Sys1 loc1 2
2 Sys2 loc1 1
3 Sys3 loc2 1
4 Sys4 loc2 1
5 Sys6 loc3 0
推荐答案
由于NA
已更改为0(merge_df[is.na(merge_df)] <- 0
),因此我们可以进行逻辑评估并获得sum
而不是,它将返回行数,并且该行已经存在
As the NA
s are already changed to 0 (merge_df[is.na(merge_df)] <- 0
), we can do a logical evaluation and get the sum
instead of n()
, which will return the number of rows and here the row is already present
library(dplyr)
merge_df %>%
group_by(Systems, Locations) %>%
summarise(frequeency = sum(Date != 0))
# A tibble: 5 x 3
# Groups: Systems [5]
# Systems Locations frequeency
# <chr> <chr> <int>
#1 Sys1 loc1 2
#2 Sys2 loc1 1
#3 Sys3 loc2 1
#4 Sys4 loc2 1
#5 Sys6 loc3 0
除了将其更改为0
之外,还可以使用sum(!is.na(Date))
完成此操作,因为NA
比0更合适
Instead of changing it to 0
, it could also be done with sum(!is.na(Date))
as NA
is more appropriate than 0
这篇关于如何根据R中的条件获取日期的频率计数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!