问题描述
我有一个DataFrame,每行都有一个报纸文章的元数据.我想将这些分组为每月的块,然后计算一列(称为type
)的值:
I've got a DataFrame with the metadata for a newspaper article in each row. I'd like to group these into monthly chunks, then count the values of one column (called type
):
monthly_articles = articles.groupby(pd.Grouper(freq="M"))
monthly_articles = monthly_articles["type"].value_counts().unstack()
这对于年度组工作正常,但是当我尝试按月分组时失败:
This works fine with an annual group but fails when I try to group by month:
ValueError: operands could not be broadcast together with shape (141,) (139,)
我认为这是因为有些月份组中没有文章.如果我迭代这些组并在每个组上打印value_counts:
I think this is because there are some month groups in which there are no articles. If I iterate the groups and print value_counts on each group:
for name, group in monthly_articles:
print(name, group["type"].value_counts())
我在2006年1月和2月的分组中得到空系列:
I get empty series in the groups for Jan and Feb of 2006:
2005-12-31 00:00:00 positive 1
Name: type, dtype: int64
2006-01-31 00:00:00 Series([], Name: type, dtype: int64)
2006-02-28 00:00:00 Series([], Name: type, dtype: int64)
2006-03-31 00:00:00 negative 6
positive 5
neutral 1
Name: type, dtype: int64
2006-04-30 00:00:00 negative 11
positive 6
neutral 3
Name: type, dtype: int64
使用value_counts()
时如何忽略空白组?
How can I ignore the empty groups when using value_counts()
?
我尝试dropna=False
失败.我认为这与这个问题是相同的问题.
I've tried dropna=False
without success. I think this is the same issue as this question.
推荐答案
您最好给我们数据样本.否则,很难指出问题所在.从您的代码段来看,几个月以来的type
数据似乎为空.您可以在分组对象上使用apply
函数,然后调用unstack
函数.这是对我有用的示例代码,数据是随机生成的
You'd better give us data sample. Otherwise, it is a little hard to point out the problem. From your code snippet, it seems that the type
data for some months is null. You can use apply
function on grouped objects and then call unstack
function. Here is the sample code that works for me, and the data is randomly generated
s = pd.Series(['positive', 'negtive', 'neutral'], index=[0, 1, 2])
atype = s.loc[np.random.randint(3, size=(150,))]
df = pd.DataFrame(dict(atype=atype.values), index=pd.date_range('2017-01-01', periods=150))
gp = df.groupby(pd.Grouper(freq='M'))
dfx = gp.apply(lambda g: g['atype'].value_counts()).unstack()
In [75]: dfx
Out[75]:
negtive neutral positive
2017-01-31 13 9 9
2017-02-28 11 11 6
2017-03-31 12 6 13
2017-04-30 8 12 10
2017-05-31 9 10 11
如果有空值:
In [76]: df.loc['2017-02-01':'2017-04-01', 'atype'] = np.nan
...: gp = df.groupby(pd.Grouper(freq='M'))
...: dfx = gp.apply(lambda g: g['atype'].value_counts()).unstack()
...:
In [77]: dfx
Out[77]:
negtive neutral positive
2017-01-31 13 9 9
2017-04-30 8 12 9
2017-05-31 9 10 11
谢谢.
这篇关于在Pandas groupby上使用value_counts时,如何忽略空序列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!