我的d的R.Smal rep的新手:

PTS_TeamHome <- c(101,87,94,110,95)
PTS_TeamAway <- c(95,89,105,111,121)
TeamHome <- c("LAL", "HOU", "SAS", "MIA", "LAL")
TeamAway <- c("IND", "LAL", "LAL", "HOU", "NOP")
df <- data.frame(cbind(TeamHome, TeamAway,PTS_TeamHome,PTS_TeamAway))
df

TeamHome TeamAway PTS_TeamHome PTS_TeamAway
  LAL      IND          101           95
  HOU      LAL           87           89
  SAS      LAL           94          105
  MIA      HOU          110          111
  LAL      NOP           95          121


想象一下,这是一个赛季的前四场比赛,共有1230场比赛。我想计算主队和客队在任何给定时间的每场比赛平均得分(平均值)。

输出如下所示:

  TeamHome TeamAway PTS_TeamHome PTS_TeamAway HOMETEAM_AVGCUMPTS ROADTEAM_AVGCUMPTS
1  LAL      IND          101           95                101                 95
2  HOU      LAL           87           89                 87                 95
3  SAS      LAL           94          105                 94              98.33
4  MIA      HOU          110          111                110                 99
5  LAL      NOP           95          121               97.5                121


请注意,公式对于主队第五场比赛的作用。由于LAL是主队,因此它会查找LAL在家里或在公路上比赛时得分多少。在这种情况下(101 + 89 + 105 + 95)/ 4 = 97.5

这是我尝试的但没有成功的方法:

lst <- list()
for(i in 1:nrow(df)) lst[[i]] <- ( cumsum(df[which(df$TEAM1[1:i]==df$TEAM1[i]),df$PTS_TeamAway,0])
                                 + cumsum(df[which(df$TEAM2[1:i]==df$TEAM1[i]),df$PTS_TeamHome,0]) )
                             / #divided by number of games
  df$HOMETEAM_AVGCUMPTS <- unlist(lst)


我想计算累积的PTS,然后计算除以的游戏数量,但这些方法均无效。

最佳答案

lst <- list()
for(i in 1:nrow(df)) lst[[i]] <- mean(c(df$PTS_TeamHome[1:i][df$TeamHome[1:i] == df$TeamHome[i]],
                                        df$PTS_TeamAway[1:i][df$TeamAway[1:i] == df$TeamHome[i]]))
df$HOMETEAM_AVGCUMPTS <- unlist(lst)


lst2 <- list()
for(i in 1:nrow(df)) lst2[[i]] <- mean(c(df$PTS_TeamAway[1:i][df$TeamAway[1:i] == df$TeamAway[i]],
                                        df$PTS_TeamHome[1:i][df$TeamHome[1:i] == df$TeamAway[i]]))
df$ROADTEAM_AVGCUMPTS <- unlist(lst2)


df
#   TeamHome TeamAway PTS_TeamHome PTS_TeamAway HOMETEAM_AVGCUMPTS ROADTEAM_AVGCUMPTS
# 1      LAL      IND          101           95                101                 95
# 2      HOU      LAL           87           89                 87                 95
# 3      SAS      LAL           94          105                 94           98.33333
# 4      MIA      HOU          110          111                110                 99
# 5      LAL      NOP           95          121               97.5                121




该方法分为两个循环。我们采用两个向量的均值。它们与mean(c(vec1,vec2))格式组合。

第一个向量是主队在主场时的得分集(队在col1中,得分在col3中),第二个向量是主队在他们离开时的得分集(团队在col2中,得分在col4)。我们使用for循环,因为它允许我们轻松控制子集中要考虑的行数。使用df$PTS_TeamHome[1:i]时,该设置仅限于过去玩过的游戏和当前玩过的游戏。我们用[df$TeamHome[1:i] == df$TeamHome[i]]对该向量进行子集化。用通俗易懂的语言表达的是“直到当前游戏的“ TeamHome”类别中的团队,它们等于当前正在播放的Home团队。”使用这些参数,我们将不允许“未来”游戏破坏分析。



对于数据,我将stringsAsFactors参数设置为FALSE。并将点列转换为类numeric。见下文。

数据

PTS_TeamHome <- c(101,87,94,110,95)
PTS_TeamAway <- c(95,89,105,111,121)
TeamHome <- c("LAL", "HOU", "SAS", "MIA", "LAL")
TeamAway <- c("IND", "LAL", "LAL", "HOU", "NOP")
df <- data.frame(cbind(TeamHome, TeamAway,PTS_TeamHome,PTS_TeamAway), stringsAsFactors=F)
df[3:4] <- lapply(df[3:4], function(x) as.numeric(x))

09-25 10:13