基于堆栈上的this帖子,我尝试了像这样的值计数功能df2 = df1.join(df1.genres.str.split(",").apply(pd.value_counts).fillna(0))
尽管我的数据有22种独特的流派,但在拆分之后我得到了42个值,但这当然不是唯一的。
数据示例:
Action Adventure Casual Design & Illustration Early Access Education Free to Play Indie Massively Multiplayer Photo Editing RPG Racing Simulation Software Training Sports Strategy Utilities Video Production Web Publishing Accounting Action Adventure Animation & Modeling Audio Production Casual Design & Illustration Early Access Education Free to Play Indie Massively Multiplayer Photo Editing RPG Racing Simulation Software Training Sports Strategy Utilities Video Production Web Publishing nan
0 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 1.0 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
(我仅粘贴了头部和第一行)
我觉得问题是由我的原始数据引起的,好吧,我的专栏(类型)是一个列表列表,其中包含方括号
例子:
[Action,Indie]
因此,当python读取它时,它会将[Action and Action and Action]读取为不同的值,并且输出为303个不同的值。
所以我所做的是:
for i in df1['genres'].tolist():
if str(i) != 'nan':
i = i[1:-1]
new.append(i)
else:
new.append('nan')
最佳答案
您必须通过函数[]
从列genres
中删除第一个和最后一个str.strip
,然后通过函数str.replace
用空字符串替换空格
import pandas as pd
df = pd.read_csv('test/Copy of AppCrawler.csv', sep="\t")
df['genres'] = df['genres'].str.strip('[]')
df['genres'] = df['genres'].str.replace(' ', '')
df = df.join(df.genres.str.split(",").apply(pd.value_counts).fillna(0))
#temporaly display 30 rows and 60 columns
with pd.option_context('display.max_rows', 30, 'display.max_columns', 60):
print df
#remove for clarity
print df.columns
Index([u'Unnamed: 0', u'appid', u'currency', u'final_price', u'genres',
u'initial_price', u'is_free', u'metacritic', u'release_date',
u'Accounting', u'Action', u'Adventure', u'Animation&Modeling',
u'AudioProduction', u'Casual', u'Design&Illustration', u'EarlyAccess',
u'Education', u'FreetoPlay', u'Indie', u'MassivelyMultiplayer',
u'PhotoEditing', u'RPG', u'Racing', u'Simulation', u'SoftwareTraining',
u'Sports', u'Strategy', u'Utilities', u'VideoProduction',
u'WebPublishing'],
dtype='object')
关于python - Python Pandas -value_counts无法正常工作,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/34089108/