python - Python-计算日期范围内的唯一标签

我正在尝试对我从互联网上抓取的一堆文本数据进行情感分析。我已经达到了一个点，我的Pandas DataFrame有我要分析的以下几列：“ post_date”（格式为dd-mm-yyyy，即01-10-2017）和“ Sentiment”（格式为“ positive”）， “中立”或“负面”）。

我希望能够计算每天/每月/每年的职位数，以及每天的正面/中性/负面职位数。

例如像由以下人员生产的产品：

print pd.value_counts(df.Sentiment)

但是我被困住了，我尝试了groupby命令的许多迭代（如下），但是不断出现错误。

df.groupby(df.post_date.dt.year)

谁能帮我实现这一目标吗？

理想情况下，所需的输出应为：

Date, Postive_Posts, Negative_Posts, Neutral_Posts, Total_Posts
01/10/2017, 10, 5, 8, 23
02/10/2017, 5, 20, 5, 30

其中date是信息的分组方式（天，月，年等），而pos / neg / neu列是与该范围内的标签数量相对应的总帖子，最后total_posts是以下信息的总数量在该范围内的帖子。

该数据当前为：

post_date, Sentiment
19/09/2017, positive
19/09/2017, positive
19/09/2017, positive
20/09/2017, negative
20/09/2017, neutral

如果您需要更多信息，请告诉我。

最佳答案

您可以使用groupby + size + unstack + add_suffix + sum：

df1 = df.groupby(['post_date','Sentiment']).size().unstack(fill_value=0).add_suffix('_Posts')
df1['Total_Posts'] = df1.sum(axis=1)
print (df1)

Sentiment   negative_Posts  neutral_Posts  positive_Posts  Total_Posts
post_date
19/09/2017               0              0               3            3
20/09/2017               1              1               0            2

一线解决方案非常相似-只需要assign：

df1 = (df.groupby(['post_date','Sentiment'])
        .size()
        .unstack(fill_value=0)
        .add_suffix('_Posts')
        .assign(Total_Posts=lambda x: x.sum(axis=1)))

print (df1)

Sentiment   negative_Posts  neutral_Posts  positive_Posts  Total_Posts
post_date
19/09/2017               0              0               3            3
20/09/2017               1              1               0            2

对于index中的列：

df1 = (df.groupby(['post_date','Sentiment'])
        .size()
        .unstack(fill_value=0)
        .add_suffix('_Posts')
        .assign(Total_Posts=lambda x: x.sum(axis=1))
        .reset_index()
        .rename_axis(None, axis=1))

print (df1)

    post_date  negative_Posts  neutral_Posts  positive_Posts  Total_Posts
0  19/09/2017               0              0               3            3
1  20/09/2017               1              1               0            2

关于python - Python-计算日期范围内的唯一标签，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/46525175/