在过滤或细分数据框中的日期后出现问题。
我正在使用googleAnalyticsR包从GA API中提取Adwords数据费用(adCost)。
adCost与GA本身匹配。但是在按月将筛选子集化后,我得到了不同的结果。
这是一个普通的API CALL,它使用2列重新调整数据帧:date和adCost。结果与GA平台中的数据匹配。
start_date <- "2018-01-01"
final_date <- "2018-02-28"
data <- google_analytics(view_id,
date_range = c(start_date, final_date),
metrics = c("adCost"),
dimensions = c("date"),
anti_sample = TRUE)
总和的预期输出(数据$ adCost):
20632.19
结果:
20632.19
但是,如果我仅对一个月(例如2月)进行数据子集或过滤,则无法获得GA在平台上显示的正确结果。
data_feb <- data %>%
filter(date >= "2018-02-01", date <= "2018-02-28")
#subset(date >= "2018-02-01", date <= "2018-02-28") gives same incorrect result
总和的预期输出(data_feb $ adCost):
10703.57
返回值:
10537.1
我什至尝试使用months()在新列中获取月份,并按月份名称进行过滤,但结果再次不匹配。
data$month <- months(data$date, abbreviate = T)
data_feb <- data %>%
filter(month == "feb")
总和的预期输出(data_feb $ adCost):
10703.57
返回值:
10537.1
会是什么呢?
数据:
data <- structure(list(date = structure(c(17532, 17533, 17534, 17535,
17536, 17537, 17538, 17539, 17540, 17541, 17542, 17543, 17544,
17545, 17546, 17547, 17548, 17549, 17550, 17551, 17552, 17553,
17554, 17555, 17556, 17557, 17558, 17559, 17560, 17561, 17562,
17563, 17564, 17565, 17566, 17567, 17568, 17569, 17570, 17571,
17572, 17573, 17574, 17575, 17576, 17577, 17578, 17579, 17580,
17581, 17582, 17583, 17584, 17585, 17586, 17587, 17588, 17589,
17590), class = "Date"), adCost = c(0, 0, 212.788901, 201.660582,
677.926913, 526.440256, 522.998839, 135.469596, 234.080656, 173.389505,
299.499735, 234.691749, 235.785283, 534.545275, 19.136849, 290.011717,
545.737919, 730.416558, 550.047731, 508.84722, 246.463323, 315.741935,
310.338589, 417.858737, 312.525658, 4.953066, 189.020612, 724.337794,
65.547729, 199.248374, 675.579031, 374.50332, 429.758963, 624.922665,
137.785316, 238.551281, 471.924357, 353.758332, 176.251992, 355.109168,
0, 0, 178.406897, 491.44716, 540.624039, 601.797631, 543.518688,
254.214552, 264.345825, 240.127257, 781.458877, 704.10741, 650.427743,
355.109168, 181.719663, 178.246083, 356.202702, 501.385456, 551.398567
)), .Names = c("date", "adCost"), row.names = c(NA, 59L), class = "data.frame", totals = list(
structure(list(adCost = "20632.193244"), .Names = "adCost")), minimums = list(
structure(list(adCost = "0.0"), .Names = "adCost")), maximums = list(
structure(list(adCost = "781.458877"), .Names = "adCost")), isDataGolden = TRUE, rowCount = 59L)
最佳答案
看来您的方法是正确的。但是sum(data_feb $ adCost)的预期输出应为10537.1。您可以使用以下内容快速进行验证。
data_feb <- data %>%
filter(date >= "2018-02-01", date <= "2018-02-28")
data_Jan <- data %>%
filter(date < "2018-02-01")
[1] 10095.09
sum(data_Jan$adCost) + sum(data_feb$adCost)
[1] 20632.19