我的d的R.Smal rep的新手:
PTS_TeamHome <- c(101,87,94,110,95)
PTS_TeamAway <- c(95,89,105,111,121)
TeamHome <- c("LAL", "HOU", "SAS", "MIA", "LAL")
TeamAway <- c("IND", "LAL", "LAL", "HOU", "NOP")
df <- data.frame(cbind(TeamHome, TeamAway,PTS_TeamHome,PTS_TeamAway))
df
TeamHome TeamAway PTS_TeamHome PTS_TeamAway
LAL IND 101 95
HOU LAL 87 89
SAS LAL 94 105
MIA HOU 110 111
LAL NOP 95 121
想象一下,这是一个赛季的前四场比赛,共有1230场比赛。我想计算主队和客队在任何给定时间的每场比赛平均得分(平均值)。
输出如下所示:
TeamHome TeamAway PTS_TeamHome PTS_TeamAway HOMETEAM_AVGCUMPTS ROADTEAM_AVGCUMPTS
1 LAL IND 101 95 101 95
2 HOU LAL 87 89 87 95
3 SAS LAL 94 105 94 98.33
4 MIA HOU 110 111 110 99
5 LAL NOP 95 121 97.5 121
请注意,公式对于主队第五场比赛的作用。由于LAL是主队,因此它会查找LAL在家里或在公路上比赛时得分多少。在这种情况下(101 + 89 + 105 + 95)/ 4 = 97.5
这是我尝试的但没有成功的方法:
lst <- list()
for(i in 1:nrow(df)) lst[[i]] <- ( cumsum(df[which(df$TEAM1[1:i]==df$TEAM1[i]),df$PTS_TeamAway,0])
+ cumsum(df[which(df$TEAM2[1:i]==df$TEAM1[i]),df$PTS_TeamHome,0]) )
/ #divided by number of games
df$HOMETEAM_AVGCUMPTS <- unlist(lst)
我想计算累积的PTS,然后计算除以的游戏数量,但这些方法均无效。
最佳答案
lst <- list()
for(i in 1:nrow(df)) lst[[i]] <- mean(c(df$PTS_TeamHome[1:i][df$TeamHome[1:i] == df$TeamHome[i]],
df$PTS_TeamAway[1:i][df$TeamAway[1:i] == df$TeamHome[i]]))
df$HOMETEAM_AVGCUMPTS <- unlist(lst)
lst2 <- list()
for(i in 1:nrow(df)) lst2[[i]] <- mean(c(df$PTS_TeamAway[1:i][df$TeamAway[1:i] == df$TeamAway[i]],
df$PTS_TeamHome[1:i][df$TeamHome[1:i] == df$TeamAway[i]]))
df$ROADTEAM_AVGCUMPTS <- unlist(lst2)
df
# TeamHome TeamAway PTS_TeamHome PTS_TeamAway HOMETEAM_AVGCUMPTS ROADTEAM_AVGCUMPTS
# 1 LAL IND 101 95 101 95
# 2 HOU LAL 87 89 87 95
# 3 SAS LAL 94 105 94 98.33333
# 4 MIA HOU 110 111 110 99
# 5 LAL NOP 95 121 97.5 121
该方法分为两个循环。我们采用两个向量的均值。它们与
mean(c(vec1,vec2))
格式组合。第一个向量是主队在主场时的得分集(队在col1中,得分在col3中),第二个向量是主队在他们离开时的得分集(团队在col2中,得分在col4)。我们使用for循环,因为它允许我们轻松控制子集中要考虑的行数。使用
df$PTS_TeamHome[1:i]
时,该设置仅限于过去玩过的游戏和当前玩过的游戏。我们用[df$TeamHome[1:i] == df$TeamHome[i]]
对该向量进行子集化。用通俗易懂的语言表达的是“直到当前游戏的“ TeamHome”类别中的团队,它们等于当前正在播放的Home团队。”使用这些参数,我们将不允许“未来”游戏破坏分析。对于数据,我将
stringsAsFactors
参数设置为FALSE
。并将点列转换为类numeric
。见下文。数据
PTS_TeamHome <- c(101,87,94,110,95)
PTS_TeamAway <- c(95,89,105,111,121)
TeamHome <- c("LAL", "HOU", "SAS", "MIA", "LAL")
TeamAway <- c("IND", "LAL", "LAL", "HOU", "NOP")
df <- data.frame(cbind(TeamHome, TeamAway,PTS_TeamHome,PTS_TeamAway), stringsAsFactors=F)
df[3:4] <- lapply(df[3:4], function(x) as.numeric(x))