基于堆栈上的this帖子,我尝试了像这样的值计数功能

df2 = df1.join(df1.genres.str.split(",").apply(pd.value_counts).fillna(0))

尽管我的数据有22种独特的流派,但在拆分之后我得到了42个值,但这当然不是唯一的。
数据示例:

     Action  Adventure   Casual  Design & Illustration   Early Access    Education   Free to Play    Indie   Massively Multiplayer   Photo Editing   RPG     Racing  Simulation  Software Training   Sports  Strategy    Utilities   Video Production    Web Publishing Accounting  Action  Adventure   Animation & Modeling    Audio Production    Casual  Design & Illustration   Early Access    Education   Free to Play    Indie   Massively Multiplayer   Photo Editing   RPG Racing  Simulation  Software Training   Sports  Strategy    Utilities   Video Production    Web Publishing  nan
0   nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 1.0 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan


(我仅粘贴了头部和第一行)

我觉得问题是由我的原始数据引起的,好吧,我的专栏(类型)是一个列表列表,其中包含方括号

例子:[Action,Indie]
因此,当python读取它时,它会将[Action and Action and Action]读取为不同的值,并且输出为303个不同的值。
所以我所做的是:

for i in df1['genres'].tolist():
if str(i) != 'nan':

    i = i[1:-1]
    new.append(i)
else:
    new.append('nan')

最佳答案

您必须通过函数[]从列genres中删除​​第一个和最后一个str.strip,然后通过函数str.replace用空字符串替换空格

import pandas as pd

df = pd.read_csv('test/Copy of AppCrawler.csv', sep="\t")


df['genres'] = df['genres'].str.strip('[]')
df['genres'] = df['genres'].str.replace(' ', '')

df = df.join(df.genres.str.split(",").apply(pd.value_counts).fillna(0))

#temporaly display 30 rows and 60 columns
with pd.option_context('display.max_rows', 30, 'display.max_columns', 60):
    print df
    #remove for clarity
print df.columns
Index([u'Unnamed: 0', u'appid', u'currency', u'final_price', u'genres',
       u'initial_price', u'is_free', u'metacritic', u'release_date',
       u'Accounting', u'Action', u'Adventure', u'Animation&Modeling',
       u'AudioProduction', u'Casual', u'Design&Illustration', u'EarlyAccess',
       u'Education', u'FreetoPlay', u'Indie', u'MassivelyMultiplayer',
       u'PhotoEditing', u'RPG', u'Racing', u'Simulation', u'SoftwareTraining',
       u'Sports', u'Strategy', u'Utilities', u'VideoProduction',
       u'WebPublishing'],
      dtype='object')

关于python - Python Pandas -value_counts无法正常工作,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/34089108/

10-12 17:23
查看更多