我正在运行一个条件循环,以基于列“ alone”的值在我的DataFrame(TDF)中创建一个新列。
如果值为0,则在字符串“ alone”中添加字符串“ alone”,否则添加“ with family”。
我正在使用代码:
我收到错误消息:

tdf['alone'].loc[['alone'] >0]= 'with family'
tdf['alone'].loc[['alone'] ==0] = 'alone'


运行上面的行后,出现以下错误:

KeyError: 'cannot use a single bool to index into setitem'


我提到了这个same question,我收集到的是我需要在row_indexer中包含tdf['alone'].loc[[row_indexer,['alone']] = 'alone',但是我不确定如何在row_indexer中获取值

最佳答案

需要具有boolean indexing和布尔掩码的loc-将DataFrame的列与值0而不是一个项目列表[alone]进行比较:

tdf.loc[tdf['alone'] > 0, 'alone'] = 'with family'
tdf.loc[tdf['alone'] ==0, 'alone'] = 'alone'


如果不能为负数,请使用numpy.where

tdf['alone'] = np.where(tdf['alone'] == 0,  'alone', 'with family')


样品:

tdf = pd.DataFrame({'alone':[4,4,5,0,5,0],
                   'col':[1,1,9,4,2,3]})

print (tdf)
   alone  col
0      4    1
1      4    1
2      5    9
3      0    4
4      5    2
5      0    3

tdf['alone'] = np.where(tdf['alone'] == 0,  'alone', 'with family')
print (tdf)

         alone  col
0  with family    1
1  with family    1
2  with family    9
3        alone    4
4  with family    2
5        alone    3


解决方案也是错误的,因为chained assignments-它可以创建一个副本来更新tdf['alone']的副本,而您不会看到:

#added boolean mask tdf['alone'] > 0
tdf['alone'].loc[tdf['alone'] > 0 ]= 'with family'

07-26 05:58