我是熊猫的新手,我必须使用包含几年记录的csv作为源,每周生成一张数据透视表。

我找不到按周筛选数据框的正确方法。这是我手动执行的操作:

    fmindate = (df.fecha.astype( 'datetime64[ns]' ) >= pd.to_datetime( "2017-03-01" ))
    fmaxdate = (df.fecha.astype( 'datetime64[ns]' ) <= pd.to_datetime( "2018-01-15" ))

    dffiltered = df[ (fmindate & fmaxdate) ]
    txt = pd.pivot_table(
            dffiltered,
            columns=[ "fecha" ],
            index=[ "org", "tipo", "estado" ],
            values=[ "destination", "time_total", "time_avg" ],
            aggfunc={ "destination": len, "time_total": total_secs_inTimeSerie,
                      "time_avg": mean_secs_inTimeSerie },
            fill_value="", margins=True
    )
    with open(report_name, "w") as text_file:
        text_file.write ( txt.to_html() )


什么是正确的方法?

提前非常感谢您!

最佳答案

您可以使用weekofyear

rng = pd.date_range('2017-04-03', periods=6, freq='6M')

df = pd.DataFrame({'org':list('aaabbb'),
                   'estado':list('cccbbb'),
                   'destination':[4,5,4,5,5,4],
                   'time_total':[7,8,9,4,2,3],
                   'time_avg':[1,3,5,7,1,0],
                   'fecha':rng,
                   'tipo':list('aaabbb')})


df["fecha"] = df["fecha"].dt.weekofyear
print (df)
   destination estado  fecha org  time_avg  time_total tipo
0            4      c     17   a         1           7    a
1            5      c     44   a         3           8    a
2            4      c     18   a         5           9    a
3            5      b     44   b         7           4    b
4            5      b     18   b         1           2    b
5            4      b     44   b         0           3    b




def total_secs_inTimeSerie(x):
    return x.sum()

def mean_secs_inTimeSerie(x):
    return x.mean()

txt = pd.pivot_table(
            df,
            columns=[ "fecha" ],
            index=[ "org", "tipo", "estado" ],
            values=[ "destination", "time_total", "time_avg" ],
            aggfunc={ "destination": len, "time_total": total_secs_inTimeSerie,
                      "time_avg": mean_secs_inTimeSerie },
            fill_value="", margins=True
    )




print (txt)
                destination               time_avg                           \
fecha                    17   18   44 All       17   18        44       All
org tipo estado
a   a    c                1  1.0  1.0   3        1  5.0  3.000000  3.000000
b   b    b                   1.0  2.0   3           1.0  3.500000  2.666667
All                       1  2.0  3.0   6        1  3.0  3.333333  2.833333

                time_total
fecha                   17    18    44 All
org tipo estado
a   a    c               7   9.0   8.0  24
b   b    b                   2.0   7.0   9
All                      7  11.0  15.0  33

09-06 08:19