我想使用Pandas和Python遍历我的.csv文件,并按季节对数据进行分组,计算一年中每个季节的平均值。目前,季刊的剧本是一月、三月、四月、六月等。我希望季节与月份相关-11:“冬”,12:“冬”,1:“冬”,2:“春”,3:“春”,4:“春”,5:“夏”,6:“夏”,7:“夏”\
8:“秋”,9:“秋”,10:“秋”
我有以下数据:
Date,HAD
01/01/1951,1
02/01/1951,-0.13161201
03/01/1951,-0.271796132
04/01/1951,-0.258977158
05/01/1951,-0.198823057
06/01/1951,0.167794502
07/01/1951,0.046093808
08/01/1951,-0.122396694
09/01/1951,-0.121824587
10/01/1951,-0.013002463
到目前为止这是我的代码:
# Iterate through a list of files in a folder looking for .csv files
for csvfilename in glob.glob("C:/Users/n-jones/testdir/output/*.csv"):
# Allocate a new file name for each file and create a new .csv file
csvfilenameonly = "RBI-Seasons-Year" + path_leaf(csvfilename)
with open("C:/Users/n-jones/testdir/season/" + csvfilenameonly, "wb") as outfile:
# Open the input csv file and allow the script to read it
with open(csvfilename, "rb") as infile:
# Create a pandas dataframe to summarise the data
df = pd.read_csv(infile, parse_dates=[0], index_col=[0], dayfirst=True)
mean = df.resample('Q-SEP', how='mean')
# Output to new csv file
mean.to_csv(outfile)
我希望这有点道理。
提前谢谢你!
最佳答案
看起来你只需要一个dict查找和一个groupby。下面的代码应该可以工作。
import pandas as pd
import os
import re
lookup = {
11: 'Winter',
12: 'Winter',
1: 'Winter',
2: 'Spring',
3: 'Spring',
4: 'Spring',
5: 'Summer',
6: 'Summer',
7: 'Summer',
8: 'Autumn',
9: 'Autumn',
10: 'Autumn'
}
os.chdir('C:/Users/n-jones/testdir/output/')
for fname in os.listdir('.'):
if re.match(".*csv$", fname):
data = pd.read_csv(fname, parse_dates=[0], dayfirst=True)
data['Season'] = data['Date'].apply(lambda x: lookup[x.month])
data['count'] = 1
data = data.groupby(['Season'])['HAD', 'count'].sum()
data['mean'] = data['HAD'] / data['count']
data.to_csv('C:/Users/n-jones/testdir/season/' + fname)
关于python - 使用python和pandas按季节对数据进行分组,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/22615288/