python - 在Python中评估体育比赛估算值的最优雅方法是什么？

Closed. This question is off-topic。它当前不接受答案。












                            想改善这个问题吗？ Update the question，所以它是on-topic，用于堆栈溢出。

                        6年前关闭。



我想评估体育游戏的估算值-以我的足球（即足球）游戏为例。我想为此使用Python。

基本上，总是有一个team_home结果，一个team_away结果，一个estimate_home和estimate_away。例如，一个游戏结束了1:0，估计为0:0-这将返回wrong。

只有四种可能的情况和结果：

wrong与上述情况相同
tendency获胜者的估算是正确的，但目标差异却不正确（例如3:0）
goal difference获取正确的目标差异，例如2:1
right获得准确的正确估算

用Python处理估算和结果的最优雅方法是什么？

最佳答案

首先，我敦促您对您将要遇到的问题进行一些思考？即

您想向每位球员报告其估算值与实际值的列表吗？
您想对玩家排名吗？
您想做更多统计工作吗？（玩家x在评估涉及团队y的游戏时效果更好）

我会假设您至少要做前两个！

我试图使代码可读/简单，但是在许多方面它比其他答案要复杂得多，但是它还为您提供了一个完整的工具箱，您可以使用它来处理并真正快速地处理大量数据。因此，将其视为另一种选择:)

基本上，对于熊猫，将来您还可以根据需要做更多的统计工作。但实际上，这类问题确实会影响您对问题的回答（或更确切地说：此处最适合回答）。

我假设您有一个数据库（关系型/ mongodb /等等），我在这里通过添加列表对其进行了伪装。即使我在这里使用熊猫，您在这里描述的大多数事情也可以以非常简单的方式在关系数据库中完成。但是熊猫会晃动;），所以这也可以正常工作。如果您使用excel或csv文件与朋友一起做某事，您还可以直接使用熊猫read_csv或read_xls导入那些文件

import pandas as pd

# game is a unique id (like a combination of date, home_team and away_team)
bet_list = [
    {'playerid': 1, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 5},
    {'playerid': 2, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 1},
    {'playerid': 3, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 1, 'away_goals': 0},
    {'playerid': 4, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0},
    {'playerid': 1, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 5},
    {'playerid': 2, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 1},
    {'playerid': 3, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 1, 'away_goals': 0},
    {'playerid': 4, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0},
    {'playerid': 1, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 5},
    {'playerid': 2, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 1},
    {'playerid': 3, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 1, 'away_goals': 0},
    {'playerid': 4, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0}
]

result_list = [
    {'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 4},
    {'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 2},
    {'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0},
]

def calculate_result(input_df):
    input_df['result'] = 0
    # home wins (result 1)
    mask = input_df['home_goals'] > input_df['away_goals']
    input_df['result'][mask] = 1
    # away wins (result 2)
    mask = input_df['home_goals'] < input_df['away_goals']
    input_df['result'][mask] = 2
    # draws (result 3)
    mask = input_df['home_goals'] == input_df['away_goals']
    input_df['result'][mask] = 3
    # goal difference
    input_df['goal_difference'] = input_df['home_goals'] - input_df['away_goals']
    return input_df

# so what where the expectations?
bet_df = pd.DataFrame(bet_list)
bet_df = calculate_result(bet_df)
# if you want to look at the results
bet_df

# what were the actuals
result_df = pd.DataFrame(result_list)
result_df = calculate_result(result_df)
# if you want to look at the results
result_df

# now let's compare them!
# i take a subsetof the result df and link results on the game
combi_df = pd.merge(left=bet_df, right=result_df[['game', 'home_goals', 'away_goals', 'result', 'goal_difference']], left_on='game', right_on='game', how='inner', suffixes=['_bet', '_actual'])
# look at the data
combi_df

def calculate_bet_score(input_df):
    '''
Notice that I'm keeping in extra columns, because those are nice for comparative analytics in the future. Think: "you had this right, just like x% of all the people"

    '''
    input_df['bet_score'] = 0
    # now look at where people have correctly predicted the result
    input_df['result_estimation'] = 0
    mask = input_df['result_bet'] == input_df['result_actual']
    input_df['result_estimation'][mask] = 1 # correct result
    input_df['bet_score'][mask] = 1 # bet score for a correct result
    # now look at where people have correctly predicted the difference in goals when they already predicted the result correctly
    input_df['goal_difference_estimation'] = 0
    bet_mask = input_df['bet_score'] == 1
    score_mask = input_df['goal_difference_bet'] == input_df['goal_difference_actual']
    input_df['goal_difference_estimation'][(bet_mask) & (score_mask)] = 1 # correct result
    input_df['bet_score'][(bet_mask) & (score_mask)] = 2 # bet score for a correct result
    # now look at where people have correctly predicted the exact goals
    input_df['goal_exact_estimation'] = 0
    bet_mask = input_df['bet_score'] == 2
    home_mask = input_df['home_goals_bet'] == input_df['home_goals_actual']
    away_mask = input_df['away_goals_bet'] == input_df['away_goals_actual']
    input_df['goal_exact_estimation'][(bet_mask) & (home_mask) & (away_mask)] = 1 # correct result
    input_df['bet_score'][(bet_mask)  & (home_mask) & (away_mask)] = 3 # bet score for a correct result
    return input_df

combi_df = calculate_bet_score(combi_df)

# now look at the results
combi_df

# and you can do nifty stuff like making a top player list like this:
combi_df.groupby('playerid')['bet_score'].sum().order(ascending=False)
# player 4 is way ahead!
# which game was the best estimated game?
combi_df.groupby('game')['bet_score'].mean().order(ascending=False)
# game 3! though abysmal predictions in general ;)

就像我说的，主要是对Python中的数据操作可能性给出不同的看法/想法。一旦您开始认真对待大量数据，这种（基于矢量/ numpy /熊猫的）方法将是最快的方法，但是您必须问自己要在数据库内部和数据库外部执行什么逻辑，等等。

希望这对您有所帮助！

关于python - 在Python中评估体育比赛估算值的最优雅方法是什么？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/20828856/