我有一些要清理的地址。
您可以看到在address1
列中,我们有一些只是数字的条目,它们应该是数字和街道名称,例如前三行。
df = pd.DataFrame({'address1':['15 Main Street','10 High Street','5 Other Street',np.nan,'15','12'],
'address2':['New York','LA','London','Tokyo','Grove Street','Garden Street']})
print(df)
address1 address2
0 15 Main Street New York
1 10 High Street LA
2 5 Other Street London
3 NaN Tokyo
4 15 Grove Street
5 12 Garden Street
我正在尝试创建一个函数,该函数将检查
address1
是否为数字,如果是,则从address1
合并address2
和街道名称,然后删除address2
。我的预期输出是这个。我们可以看到索引4和5现在具有完整的
address1
条目: address1 address2
0 15 Main Street New York
1 10 High Street LA
2 5 Other Street London
3 NaN Tokyo
4 15 Grove Street NaN <---
5 12 Garden Street NaN <---
我尝试使用.apply()函数进行的操作:
def f(x):
try:
#if address1 is int
if isinstance(int(x['address1']), int):
# create new address using address1 + address 2
newaddress = str(x['address1']) +' '+ str(x['address2'])
# delete address2
x['address2'] = np.nan
# return newaddress to address1 column
return newadress
except:
pass
应用功能:
df['address1'] = df.apply(f,axis=1)
但是,列
address1
现在都是None
。我已经尝试了此功能的一些变体,但无法使其正常工作。不胜感激建议。
最佳答案
您可以通过使用apply
选择需要修改的确切行来避免str.isdigit
。创建掩码m
以标识这些行。在这些行上使用agg
并为这些行构造一个子数据框。最后append
回到原始df
m = df.address1.astype(str).str.isdigit()
df1 = df[m].agg(' '.join, axis=1).to_frame('address1').assign(address2=np.nan)
Out[179]:
address1 address2
4 15 Grove Street NaN
5 12 Garden Street NaN
最后,将
append
返回到df
df[~m].append(df1)
Out[200]:
address1 address2
0 15 Main Street New York
1 10 High Street LA
2 5 Other Street London
3 NaN Tokyo
4 15 Grove Street NaN
5 12 Garden Street NaN
如果仍然坚持使用
apply
,则需要修改f
以返回if
之外,以返回未修改的行和已修改的行def f(x):
y = x.copy()
try:
#if address1 is int
if isinstance(int(x['address1']), int):
# create new address using address1 + address 2
y['address1'] = str(x['address1']) +' '+ str(x['address2'])
# delete address2
y['address2'] = np.nan
except:
pass
return y
df.apply(f, axis=1)
Out[213]:
address1 address2
0 15 Main Street New York
1 10 High Street LA
2 5 Other Street London
3 NaN Tokyo
4 15 Grove Street NaN
5 12 Garden Street NaN
注意:建议
apply
不应修改传递的对象,所以我做y = x.copy()
并修改并返回y