在过滤或细分数据框中的日期后出现问题。

我正在使用googleAnalyticsR包从GA API中提取Adwords数据费用(adCost)。

adCost与GA本身匹配。但是在按月将筛选子集化后,我得到了不同的结果。

这是一个普通的API CALL,它使用2列重新调整数据帧:date和adCost。结果与GA平台中的数据匹配。

start_date <- "2018-01-01"

final_date <- "2018-02-28"


data <- google_analytics(view_id,
                         date_range = c(start_date, final_date),
                         metrics = c("adCost"),
                         dimensions = c("date"),
                         anti_sample = TRUE)


总和的预期输出(数据$ adCost):

20632.19

结果:

20632.19

但是,如果我仅对一个月(例如2月)进行数据子集或过滤,则无法获得GA在平台上显示的正确结果。

data_feb <- data %>%
            filter(date >= "2018-02-01", date <= "2018-02-28")
            #subset(date >= "2018-02-01", date <= "2018-02-28") gives same incorrect result


总和的预期输出(data_feb $ adCost):

10703.57

返回值:

10537.1

我什至尝试使用months()在新列中获取月份,并按月份名称进行过滤,但结果再次不匹配。

data$month <- months(data$date, abbreviate = T)

data_feb <- data %>%
            filter(month == "feb")


总和的预期输出(data_feb $ adCost):

10703.57

返回值:

10537.1

会是什么呢?

数据:

data <- structure(list(date = structure(c(17532, 17533, 17534, 17535,
    17536, 17537, 17538, 17539, 17540, 17541, 17542, 17543, 17544,
    17545, 17546, 17547, 17548, 17549, 17550, 17551, 17552, 17553,
    17554, 17555, 17556, 17557, 17558, 17559, 17560, 17561, 17562,
    17563, 17564, 17565, 17566, 17567, 17568, 17569, 17570, 17571,
    17572, 17573, 17574, 17575, 17576, 17577, 17578, 17579, 17580,
    17581, 17582, 17583, 17584, 17585, 17586, 17587, 17588, 17589,
    17590), class = "Date"), adCost = c(0, 0, 212.788901, 201.660582,
    677.926913, 526.440256, 522.998839, 135.469596, 234.080656, 173.389505,
    299.499735, 234.691749, 235.785283, 534.545275, 19.136849, 290.011717,
    545.737919, 730.416558, 550.047731, 508.84722, 246.463323, 315.741935,
    310.338589, 417.858737, 312.525658, 4.953066, 189.020612, 724.337794,
    65.547729, 199.248374, 675.579031, 374.50332, 429.758963, 624.922665,
    137.785316, 238.551281, 471.924357, 353.758332, 176.251992, 355.109168,
    0, 0, 178.406897, 491.44716, 540.624039, 601.797631, 543.518688,
    254.214552, 264.345825, 240.127257, 781.458877, 704.10741, 650.427743,
    355.109168, 181.719663, 178.246083, 356.202702, 501.385456, 551.398567
    )), .Names = c("date", "adCost"), row.names = c(NA, 59L), class = "data.frame", totals = list(
        structure(list(adCost = "20632.193244"), .Names = "adCost")), minimums = list(
        structure(list(adCost = "0.0"), .Names = "adCost")), maximums = list(
        structure(list(adCost = "781.458877"), .Names = "adCost")), isDataGolden = TRUE, rowCount = 59L)

最佳答案

看来您的方法是正确的。但是sum(data_feb $ adCost)的预期输出应为10537.1。您可以使用以下内容快速进行验证。

data_feb <- data %>%
  filter(date >= "2018-02-01", date <= "2018-02-28")

data_Jan <- data %>%
  filter(date < "2018-02-01")

[1] 10095.09

sum(data_Jan$adCost) + sum(data_feb$adCost)

[1] 20632.19

10-04 19:28