我有一个熊猫数据框,正在使用tldextract库。我在创建新列以及连接第二个和第三个分隔的字符串时遇到问题。
#First 5 rows for testing purposes
df = pd.DataFrame(request['destinationhostname'].iloc[0:5])
destinationhostname
0 pod51042psh.outlook.com
1 s.mrmserve.com
2 client-office365-tas.msedge.net
3 otf.msn.com
4 log.pinterest.com
#Applying tld extract on destinationhostname column
df['req'] = request.destinationhostname.apply(tldextract.extract)
destinationhostname req
0 pod51042psh.outlook.com (pod51042psh, outlook, com)
1 s.mrmserve.com (s, mrmserve, com)
2 client-office365-tas.msedge.net (client-office365-tas, msedge, net)
3 otf.msn.com (otf, msn, com)
4 log.pinterest.com (log, pinterest, com)
我已经尝试了以下类似的许多方法,但是不断出错。
df['fld'] = df['req'].apply('.'.join[1:3])
TypeError: 'builtin_function_or_method' object has no attribute '__getitem__'
要么
TypeError: sequence item 0: expected string, ExtractResult found
我想要的输出将是:
destinationhostname req fld
0 pod51042psh.outlook.com (pod51042psh, outlook, com) outlook.com
1 s.mrmserve.com (s, mrmserve, com) mrmserve.com
2 client-office365-tas.msedge.net (client-office365-tas, msedge, net) msedge.net
3 otf.msn.com (otf, msn, com) msn.com
4 log.pinterest.com (log, pinterest, com) pinterest.com
最佳答案
切片str
对象,然后切片join
df['fld'] = df.req.str[1:].str.join('.')
df
destinationhostname req fld
0 pod51042psh.outlook.com (pod51042psh, outlook, com) outlook.com
1 s.mrmserve.com (s, mrmserve, com) mrmserve.com
2 client-office365-tas.msedge.net (client-office365-tas, msedge, net) msedge.net
3 otf.msn.com (otf, msn, com) msn.com
4 log.pinterest.com (log, pinterest, com) pinterest.com
或者作为@coldspeed has shown,您可以在数组引用的末尾进行切片。
df['fld'] = df.req.str[-2:].str.join('.')
关于python - Pandas 使用tldextract将单元格中的最后2个逗号分隔的项目加入,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/53725828/