我有一些要清理的地址。

您可以看到在address1列中,我们有一些只是数字的条目,它们应该是数字和街道名称,例如前三行。

df = pd.DataFrame({'address1':['15 Main Street','10 High Street','5 Other Street',np.nan,'15','12'],
                  'address2':['New York','LA','London','Tokyo','Grove Street','Garden Street']})

print(df)

         address1       address2
0  15 Main Street       New York
1  10 High Street             LA
2  5 Other Street         London
3             NaN          Tokyo
4              15   Grove Street
5              12  Garden Street


我正在尝试创建一个函数,该函数将检查address1是否为数字,如果是,则从address1合并address2和街道名称,然后删除address2

我的预期输出是这个。我们可以看到索引4和5现在具有完整的address1条目:

           address1  address2
0    15 Main Street  New York
1    10 High Street        LA
2    5 Other Street    London
3               NaN     Tokyo
4   15 Grove Street       NaN <---
5  12 Garden Street       NaN <---


我尝试使用.apply()函数进行的操作:

def f(x):

    try:
        #if address1 is int
        if isinstance(int(x['address1']), int):

            # create new address using address1 + address 2
            newaddress = str(x['address1']) +' '+ str(x['address2'])

            # delete address2
            x['address2'] = np.nan

            # return newaddress to address1 column
            return newadress

    except:
        pass


应用功能:

df['address1'] = df.apply(f,axis=1)


但是,列address1现在都是None

我已经尝试了此功能的一些变体,但无法使其正常工作。不胜感激建议。

最佳答案

您可以通过使用apply选择需要修改的确切行来避免str.isdigit。创建掩码m以标识这些行。在这些行上使用agg并为这些行构造一个子数据框。最后append回到原始df

m = df.address1.astype(str).str.isdigit()
df1 = df[m].agg(' '.join, axis=1).to_frame('address1').assign(address2=np.nan)

Out[179]:
           address1  address2
4   15 Grove Street       NaN
5  12 Garden Street       NaN


最后,将append返回到df

df[~m].append(df1)

Out[200]:
           address1  address2
0    15 Main Street  New York
1    10 High Street        LA
2    5 Other Street    London
3               NaN     Tokyo
4   15 Grove Street       NaN
5  12 Garden Street       NaN




如果仍然坚持使用apply,则需要修改f以返回if之外,以返回未修改的行和已修改的行

def f(x):
    y = x.copy()
    try:
        #if address1 is int
        if isinstance(int(x['address1']), int):

            # create new address using address1 + address 2
            y['address1'] = str(x['address1']) +' '+ str(x['address2'])

            # delete address2
            y['address2'] = np.nan
    except:
        pass

    return y


df.apply(f, axis=1)

Out[213]:
           address1  address2
0    15 Main Street  New York
1    10 High Street        LA
2    5 Other Street    London
3               NaN     Tokyo
4   15 Grove Street       NaN
5  12 Garden Street       NaN


注意:建议apply不应修改传递的对象,所以我做y = x.copy()并修改并返回y

07-24 09:53