问题描述
我正在使用Open Food Facts数据集,该数据集非常混乱.有一个称为数量的列,其中包含有关相应食物数量的信息.条目看起来像:
I am working with the Open Food Facts dataset which is very messy.There is a column called quantity in which in information about the quantity of respective food.the entries look like:
365 g (314 ml)
992 g
2.46 kg
0,33 litre
15.87oz
250 ml
1 L
33 cl
...等等(非常混乱!)我想创建一个名为is_liquid
的新列.我的想法是,如果数量字符串包含l
或L
,则此行中的is_liquid字段应为1,否则为0.这是我尝试过的:我写了这个函数:
... and so on (very messy!!!)I want to create a new column called is_liquid
.My idea is that if the quantity string contains an l
or L
the is_liquid field in this row should get a 1 and if not 0.Here is what I've tried:I wrote this function:
def is_liquid(x):
if x.str.contains('l'):
return 1
elif x.str.contains('L'):
return 1
else: return 0
(顺便说一句:如果某种东西以盎司"衡量,它是液态的吗?)
(BTW: if something is measured in 'oz' is it liquid?)
然后尝试应用它
df['is_liquid'] = df['quantity'].apply(is_liquid)
但是我得到的只是这个错误:
But all I get is this error:
AttributeError: 'str' object has no attribute 'str'
有人可以帮我吗?
推荐答案
使用 str.contains
和case=False
表示布尔掩码,并通过 Series.astype
:
Use str.contains
with case=False
for boolean mask and convert it to integer
s by Series.astype
:
df['is_liquid']= df['liquids'].str.contains('L', case=False).astype(int)
print(df)
liquids is_liquid
0 365 g (314 ml) 1
1 992 g 0
2 2.46 kg 0
3 0,33 litre 1
4 15.87oz 0
5 250 ml 1
6 1 L 1
7 33 cl 1
这篇关于根据字母"l"或"L"是否在另一列的字符串中创建新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!