我是熊猫的新手,我必须使用包含几年记录的csv作为源,每周生成一张数据透视表。
我找不到按周筛选数据框的正确方法。这是我手动执行的操作:
fmindate = (df.fecha.astype( 'datetime64[ns]' ) >= pd.to_datetime( "2017-03-01" ))
fmaxdate = (df.fecha.astype( 'datetime64[ns]' ) <= pd.to_datetime( "2018-01-15" ))
dffiltered = df[ (fmindate & fmaxdate) ]
txt = pd.pivot_table(
dffiltered,
columns=[ "fecha" ],
index=[ "org", "tipo", "estado" ],
values=[ "destination", "time_total", "time_avg" ],
aggfunc={ "destination": len, "time_total": total_secs_inTimeSerie,
"time_avg": mean_secs_inTimeSerie },
fill_value="", margins=True
)
with open(report_name, "w") as text_file:
text_file.write ( txt.to_html() )
什么是正确的方法?
提前非常感谢您!
最佳答案
您可以使用weekofyear
:
rng = pd.date_range('2017-04-03', periods=6, freq='6M')
df = pd.DataFrame({'org':list('aaabbb'),
'estado':list('cccbbb'),
'destination':[4,5,4,5,5,4],
'time_total':[7,8,9,4,2,3],
'time_avg':[1,3,5,7,1,0],
'fecha':rng,
'tipo':list('aaabbb')})
df["fecha"] = df["fecha"].dt.weekofyear
print (df)
destination estado fecha org time_avg time_total tipo
0 4 c 17 a 1 7 a
1 5 c 44 a 3 8 a
2 4 c 18 a 5 9 a
3 5 b 44 b 7 4 b
4 5 b 18 b 1 2 b
5 4 b 44 b 0 3 b
def total_secs_inTimeSerie(x):
return x.sum()
def mean_secs_inTimeSerie(x):
return x.mean()
txt = pd.pivot_table(
df,
columns=[ "fecha" ],
index=[ "org", "tipo", "estado" ],
values=[ "destination", "time_total", "time_avg" ],
aggfunc={ "destination": len, "time_total": total_secs_inTimeSerie,
"time_avg": mean_secs_inTimeSerie },
fill_value="", margins=True
)
print (txt)
destination time_avg \
fecha 17 18 44 All 17 18 44 All
org tipo estado
a a c 1 1.0 1.0 3 1 5.0 3.000000 3.000000
b b b 1.0 2.0 3 1.0 3.500000 2.666667
All 1 2.0 3.0 6 1 3.0 3.333333 2.833333
time_total
fecha 17 18 44 All
org tipo estado
a a c 7 9.0 8.0 24
b b b 2.0 7.0 9
All 7 11.0 15.0 33