本文介绍了在Pandas groupby上使用value_counts时,如何忽略空序列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!



I've got a DataFrame with the metadata for a newspaper article in each row. I'd like to group these into monthly chunks, then count the values of one column (called type):

monthly_articles = articles.groupby(pd.Grouper(freq="M"))
monthly_articles = monthly_articles["type"].value_counts().unstack()


This works fine with an annual group but fails when I try to group by month:

ValueError: operands could not be broadcast together with shape (141,) (139,)


I think this is because there are some month groups in which there are no articles. If I iterate the groups and print value_counts on each group:

for name, group in monthly_articles:
    print(name, group["type"].value_counts())


I get empty series in the groups for Jan and Feb of 2006:

2005-12-31 00:00:00 positive    1
Name: type, dtype: int64
2006-01-31 00:00:00 Series([], Name: type, dtype: int64)
2006-02-28 00:00:00 Series([], Name: type, dtype: int64)
2006-03-31 00:00:00 negative    6
positive    5
neutral     1
Name: type, dtype: int64
2006-04-30 00:00:00 negative    11
positive     6
neutral      3
Name: type, dtype: int64


How can I ignore the empty groups when using value_counts()?


I've tried dropna=False without success. I think this is the same issue as this question.



You'd better give us data sample. Otherwise, it is a little hard to point out the problem. From your code snippet, it seems that the type data for some months is null. You can use apply function on grouped objects and then call unstack function. Here is the sample code that works for me, and the data is randomly generated

s = pd.Series(['positive', 'negtive', 'neutral'], index=[0, 1, 2])
atype = s.loc[np.random.randint(3, size=(150,))]

df = pd.DataFrame(dict(atype=atype.values), index=pd.date_range('2017-01-01',  periods=150))

gp = df.groupby(pd.Grouper(freq='M'))
dfx = gp.apply(lambda g: g['atype'].value_counts()).unstack()

In [75]: dfx
            negtive  neutral  positive
2017-01-31       13        9         9
2017-02-28       11       11         6
2017-03-31       12        6        13
2017-04-30        8       12        10
2017-05-31        9       10        11


In [76]: df.loc['2017-02-01':'2017-04-01', 'atype'] = np.nan
    ...: gp = df.groupby(pd.Grouper(freq='M'))
    ...: dfx = gp.apply(lambda g: g['atype'].value_counts()).unstack()

In [77]: dfx
            negtive  neutral  positive
2017-01-31       13        9         9
2017-04-30        8       12         9
2017-05-31        9       10        11


这篇关于在Pandas groupby上使用value_counts时,如何忽略空序列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-31 03:44