我正在运行一个条件循环,以基于列“ alone”的值在我的DataFrame(TDF)中创建一个新列。
如果值为0,则在字符串“ alone”中添加字符串“ alone”,否则添加“ with family”。
我正在使用代码:
我收到错误消息:
tdf['alone'].loc[['alone'] >0]= 'with family'
tdf['alone'].loc[['alone'] ==0] = 'alone'
运行上面的行后,出现以下错误:
KeyError: 'cannot use a single bool to index into setitem'
我提到了这个same question,我收集到的是我需要在
row_indexer
中包含tdf['alone'].loc[[row_indexer,['alone']] = 'alone'
,但是我不确定如何在row_indexer
中获取值 最佳答案
需要具有boolean indexing
和布尔掩码的loc
-将DataFrame
的列与值0
而不是一个项目列表[alone]
进行比较:
tdf.loc[tdf['alone'] > 0, 'alone'] = 'with family'
tdf.loc[tdf['alone'] ==0, 'alone'] = 'alone'
如果不能为负数,请使用
numpy.where
:tdf['alone'] = np.where(tdf['alone'] == 0, 'alone', 'with family')
样品:
tdf = pd.DataFrame({'alone':[4,4,5,0,5,0],
'col':[1,1,9,4,2,3]})
print (tdf)
alone col
0 4 1
1 4 1
2 5 9
3 0 4
4 5 2
5 0 3
tdf['alone'] = np.where(tdf['alone'] == 0, 'alone', 'with family')
print (tdf)
alone col
0 with family 1
1 with family 1
2 with family 9
3 alone 4
4 with family 2
5 alone 3
解决方案也是错误的,因为chained assignments-它可以创建一个副本来更新
tdf['alone']
的副本,而您不会看到:#added boolean mask tdf['alone'] > 0
tdf['alone'].loc[tdf['alone'] > 0 ]= 'with family'