问题描述
我有一个数据集,如下所示.我的实际数据集有5000列:
I have a dataset as follows as a sample. My actual dataset has 5000 columns:
# Define Adstock Rate
adstock_rate = 0.50
lag_number = 3
# Create Data
advertising = c(117.913, 120.112, 125.828, 115.354, 177.090, 141.647, 137.892, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 158.511, 109.385, 91.084, 79.253, 102.706,
78.494, 135.114, 114.549, 87.337, 107.829, 125.020, 82.956, 60.813, 83.149, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 129.515, 105.486, 111.494, 107.099, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
134.913, 123.112, 178.828, 112.354, 100.090, 167.647, 177.892, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 112.511, 155.385, 123.084, 89.253, 67.706,
23.494, 122.114, 112.549, 65.337, 134.829, 123.020, 81.956, 23.813, 65.149, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 145.515, 154.486, 121.494, 117.099, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000
)
advertising2 = c(43.913, 231.112, 76.828, 22.354, 98.090, 123.647, 90.892, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 234.511, 143.385, 78.084, 89.253, 12.706,
34.494, 56.114, 78.549, 12.337, 67.829, 42.020, 90.956, 23.813, 83.149, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 52.515, 76.486, 89.494, 12.099, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
67.913, 12.112, 45.828, 78.354, 89.090, 90.647, 23.892, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 78.511, 23.385, 43.084, 67.253, 33.706,
56.494, 78.114, 98.549, 45.337, 31.829, 67.020, 87.956, 94.813, 65.149, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 55.515, 32.486, 78.494, 33.099, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000
)
Region = c(500, 500, 500, 500, 500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500,
500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500, 500, 500,
500, 500,
501, 501, 501, 501, 501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501,
501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501, 501, 501,
501, 501)
advertising_dataset<-data.frame(cbind(Region, advertising, advertising2))
我的数据集如下:
head(advertising_dataset, 15)
Region advertising advertising2
1 500 117.913 43.913
2 500 120.112 231.112
3 500 125.828 76.828
4 500 115.354 22.354
5 500 177.090 98.090
6 500 141.647 123.647
7 500 137.892 90.892
8 500 0.000 0.000
9 500 0.000 0.000
10 500 0.000 0.000
11 500 0.000 0.000
12 500 0.000 0.000
13 500 0.000 0.000
14 500 0.000 0.000
15 500 0.000 0.000
然后仅在 1 列中创建一个for循环,然后在Region
之后创建一个group_by函数.
A for-loop is then created to only 1 column and then a group_by function after that by Region
.
foo <- function(df_, lag_val = 1) {
df_$adstocked_advertising = df_$advertising
for (i in (1 + lag_val):nrow(df_)) {
df_$adstocked_advertising[i] = df_$advertising[i] + adstock_rate *
df_$adstocked_advertising[i - lag_val]
}
return(df_)
}
adv_2 <- data.frame(advertising_dataset %>%
group_by(Region) %>%
do(foo(data.frame(.), lag_val = 3)))
如何将上述包含adv_2
的函数应用于2:ncol(advertising_dataset)中的所有列,而不只是应用于advertising
列?
How do I apply the above functions including adv_2
to all columns from 2:ncol(advertising_dataset) rather than just the advertising
column?
最后我的最终列数应该增加一倍,因为将为每个现有列创建一个新修订的列.
My final number of columns should double in the end because a newly revised column will be created for every existing column.
使用上述功能,我感觉这是沿着这些思路的事情:
I have a feeling it is something along these lines, with the function I above:
data.frame(advertising_dataset[1],
apply(advertising_dataset[2:ncol(advertising_dataset)],2, foo) )
任何帮助都会很棒,谢谢!
Any help would be great, thanks!
推荐答案
我们可以将accumulate
与mutate_all
library(tidyverse)
out <- advertising_dataset %>%
group_by(Region) %>%
mutate_all(funs(adstocked = accumulate(., ~ .y + adstock_rate * .x)))
out
# A tibble: 104 x 5
# Groups: Region [2]
# Region advertising advertising2 advertising_adstocked advertising2_adstocked
# <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 500 118. 43.9 118. 43.9
# 2 500 120. 231. 179. 253.
# 3 500 126. 76.8 215. 203.
# 4 500 115. 22.4 223. 124.
# 5 500 177. 98.1 289. 160.
# 6 500 142. 124. 286. 204.
# 7 500 138. 90.9 281. 193.
# 8 500 0 0 140. 96.4
# 9 500 0 0 70.2 48.2
#10 500 0 0 35.1 24.1
# ... with 94 more rows
检查OP解决方案的输出
Checking with the output from OP's solution
head(out[[4]])
#[1] 117.9130 179.0685 215.3623 223.0351 288.6076 285.9508
head(adv_2[[4]])
#[1] 117.9130 179.0685 215.3623 223.0351 288.6076 285.9508
更新
我们可以针对不同的lag_val
foo1 <- function(dot, lag_val = 1) {
tmp <- dot
for(i in (1 + lag_val): length(tmp)) {
tmp[i] <- tmp[i] + adstock_rate * tmp[i - lag_val]
}
return(tmp)
}
advertising_dataset %>%
group_by(Region) %>%
mutate_all(funs(adstocked = foo1(., lag_val = 1)))
# A tibble: 104 x 5
# Groups: Region [2]
# Region advertising advertising2 advertising_adstocked advertising2_adstocked
# <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 500 118. 43.9 118. 43.9
# 2 500 120. 231. 179. 253.
# 3 500 126. 76.8 215. 203.
# 4 500 115. 22.4 223. 124.
# 5 500 177. 98.1 289. 160.
# 6 500 142. 124. 286. 204.
# 7 500 138. 90.9 281. 193.
# 8 500 0 0 140. 96.4
# 9 500 0 0 70.2 48.2
#10 500 0 0 35.1 24.1
# ... with 94 more rows
-更改lag_val
advertising_dataset %>%
group_by(Region) %>%
mutate_all(funs(adstocked = foo1(., lag_val = 2)))
# A tibble: 104 x 5
# Groups: Region [2]
# Region advertising advertising2 advertising_adstocked advertising2_adstocked
# <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 500 118. 43.9 118. 43.9
# 2 500 120. 231. 120. 231.
# 3 500 126. 76.8 185. 98.8
# 4 500 115. 22.4 175. 138.
# 5 500 177. 98.1 269. 147.
# 6 500 142. 124. 229. 193.
# 7 500 138. 90.9 273. 165.
# 8 500 0 0 115. 96.3
# 9 500 0 0 136. 82.3
#10 500 0 0 57.3 48.2
# ... with 94 more rows
这篇关于具有现有按行的按行的按行按列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!