当缺少月份或季度时，删除年度值

本文介绍了当缺少月份或季度时，删除年度值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我具有不同ID的每月，每季度和每年的数据.如果某个月份缺少该值，则根据该月份属于哪个季度，我们需要标记该季度以及年度值.

I have monthly, quarterly, and yearly data for different ids. If for any month, the value is missing then depending on which quarter that month falls in, we need to flag that Quarter and the yearly value as well.

类似地，报告季度和年度的时间以及是否缺少季度.然后需要标记年度值.

Similarly, when quarterly and yearly are reported, and if a quarter is missing. Then need to flag the yearly values.

如果我们没有缺少月度值，则不应标记季度和年度.

In case where we have no monthly values are missing, then quarterly and yearly should NOT be flagged.

在下表中为ID 1过滤...

In below table filtered for id 1 ...

第2行是第1季度的值.我们保留这一点是因为该季度的月度价值没有丢失.
第6行是第2季度的值.标记它是因为，第4个月缺少值，而第4个月属于第2季度.
类似情况，因为第7个月是&失踪了8位.第14行(第4季度)，因为缺少第12个月
行1是年值.我们将其标记为原因，因为在这一年中，总体而言，我们有几个月的价值缺失

Row 2 is Quarter 1 value. We retain this because, monthly value in that quarter is not missing.
Row 6 is Quarter 2 value. It is flagged because, month 4 has missing value and month 4 belongs to quarter 2.
similar case for row 10 (Q3) because month 7 & 8 are missing. Row 14 (Q4) because month 12 is missing
Row 1 is year value. We flag it because overall in that year we have months with missing value

Example table: 
# A tibble: 17 x 6
      id value month quarter  year  flag
   <int> <int> <int>   <int> <int> <int>
 1     1  1232    NA      NA  2017     1
 2     1    75    NA       1  2017     0
 3     1    26     1       1  2017     0
 4     1    29     2       1  2017     0
 5     1    20     3       1  2017     0
 6     1    93    NA       2  2017     1
 7     1    NA     4       2  2017     0
 8     1    33     5       2  2017     0
 9     1    35     6       2  2017     0
10     1    51    NA       3  2017     1
11     1    NA     7       3  2017     0
12     1    NA     8       3  2017     0
13     1     3     9       3  2017     0
14     1    55    NA       4  2017     1
15     1    15    10       4  2017     0
16     1    25    11       4  2017     0
17     1    NA    12       4  2017     0


dput(df)
structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L), value = c(1232L, 75L, 26L, 29L, 20L, 
93L, NA, 33L, 35L, 51L, NA, NA, 3L, 55L, 15L, 25L, NA, 1232L, 
75L, 26L, 29L, 20L, 93L, 5L, 33L, 35L, 51L, 6L, NA, 3L, 55L, 
15L, 25L, NA, 1232L, 75L, 26L, 29L, NA, 5L, 33L, 35L, 6L, NA, 
3L, 15L, 25L, NA), month = c(NA, NA, 1L, 2L, 3L, NA, 4L, 5L, 
6L, NA, 7L, 8L, 9L, NA, 10L, 11L, 12L, NA, NA, 1L, 2L, 3L, NA, 
4L, 5L, 6L, NA, 7L, 8L, 9L, NA, 10L, 11L, 12L, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), quarter = c(NA, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, NA, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 
NA, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L), year = c(2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 217L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L)), class = "data.frame", row.names = c(NA, -48L))

所需的输出

> dput(df_output)
structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L), value = c(1232L, 75L, 26L, 29L, 20L, 
93L, NA, 33L, 35L, 51L, NA, NA, 3L, 55L, 15L, 25L, NA, 1232L, 
75L, 26L, 29L, 20L, 93L, 5L, 33L, 35L, 51L, 6L, NA, 3L, 55L, 
15L, 25L, NA, 1232L, 75L, 26L, 29L, NA, 5L, 33L, 35L, 6L, NA, 
3L, 15L, 25L, NA), month = c(NA, NA, 1L, 2L, 3L, NA, 4L, 5L, 
6L, NA, 7L, 8L, 9L, NA, 10L, 11L, 12L, NA, NA, 1L, 2L, 3L, NA, 
4L, 5L, 6L, NA, 7L, 8L, 9L, NA, 10L, 11L, 12L, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), quarter = c(NA, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, NA, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 
NA, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L), year = c(2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L), flag = c(1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 
0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-48L))

这是我到目前为止所拥有的

This is what I have so far

df_output %>% 
  dplyr::group_by(id,year) %>% 
  dplyr::mutate(quarter_d = dplyr::case_when(
    is.na(month) & !is.na(quarter) ~ paste("Q",quarter,year,sep="_"),
    )) %>% 
  dplyr::mutate(quarter_flag = dplyr::case_when(
    is.na(value) ~ paste("Q",ceiling(as.numeric(month) / 3),year,sep="_")
  ))

推荐答案

您可以先检查每年的 NA 值，然后再检查每个季度的值，如果这些值中的任何一个为1，则将其分配为1.

You can check for NA values first for each year and then for each quarter and assign 1 if any of those value is 1.

library(dplyr)

df %>%
  group_by(id) %>%
  mutate(year_flag = +(any(is.na(value)) & row_number() == 1)) %>%
  group_by(quarter, .add = TRUE) %>%
  mutate(quarter_flag = +(any(is.na(value)) & row_number() == 1)) %>%
  ungroup %>%
  mutate(flag = pmax(year_flag, quarter_flag))

#      id value month quarter  year year_flag quarter_flag  flag
#   <int> <int> <int>   <int> <int>     <int>        <int> <int>
# 1     1  1232    NA      NA  2017         1            0     1
# 2     1    75    NA       1  2017         0            0     0
# 3     1    26     1       1  2017         0            0     0
# 4     1    29     2       1  2017         0            0     0
# 5     1    20     3       1  2017         0            0     0
# 6     1    93    NA       2  2017         0            1     1
# 7     1    NA     4       2  2017         0            0     0
# 8     1    33     5       2  2017         0            0     0
# 9     1    35     6       2  2017         0            0     0
#10     1    51    NA       3  2017         0            1     1
# … with 38 more rows

我保留了其他列 year_flag 和 quarter_flag ，以便您了解发生了什么.您可以根据需要从最终输出中删除它们.

I have kept additional columns year_flag and quarter_flag so that you understand what is going on. You can remove them from final output if not needed.

这篇关于当缺少月份或季度时，删除年度值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！