本文介绍了Oneliner从多个列创建字符串列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
考虑以下代码
import pandas as pd
df = pd.DataFrame({'col_1' : [1, 2, 3, 4],\
'col_2' : ['a', 'b', 'c', 'd'],\
'col_3' : ['hey', 'ho', 'banana', 'go']})
col = df['col_1'].astype(str) + '_' + \
df['col_2'].astype(str) + '_' + \
df['col_3'].astype(str)
col
Out[12]:
0 1_a_hey
1 2_b_ho
2 3_c_banana
3 4_d_go
dtype: object
有人能想到使用数组col_names = ['col_1', 'col_2', 'col_3']
作为输入来生成col
的oneliner吗?
Can anybody think of a oneliner producing col
using the array col_names = ['col_1', 'col_2', 'col_3']
as input?
即col_sum = something_smart(col_names)
显然,例如different_col_set = ['col_2', 'col_3']
something_smart(different_col_set)
Out[13]:
0 a_hey
1 b_ho
2 c_banana
3 d_go
dtype: object
实际上,col_names是一个数组,其中包含数据框的列名称的任何子集.
The point is really that col_names is an array containing any subset of the column names of the dataframe.
推荐答案
选项1] 使用apply
您可以'_'.join
In [5521]: df[col_names].astype(str).apply('_'.join, axis=1)
Out[5521]:
0 1_a_hey
1 2_b_ho
2 3_c_banana
3 4_d_go
dtype: object
和
In [5523]: df[different_col_set].astype(str).apply('_'.join, axis=1)
Out[5523]:
0 a_hey
1 b_ho
2 c_banana
3 d_go
dtype: object
选项2] .在这种情况下,使用reduce
的速度比应用cc的速度快.
Option 2] Using reduce
is faster than apply in this case.
In [5527]: reduce(lambda x, y: x + '_' + y, [df[c].astype(str) for c in col_names])
Out[5527]:
0 1_a_hey
1 2_b_ho
2 3_c_banana
3 4_d_go
dtype: object
In [5528]: reduce(lambda x, y: x + '_' + y, [df[c].astype(str) for c in different_col_set])
Out[5528]:
0 a_hey
1 b_ho
2 c_banana
3 d_go
dtype: object
与reduce(lambda x, y: x.astype(str) + '_' +y.astype(str), [df[x] for x in col_names])
时间
In [5556]: df.shape
Out[5556]: (10000, 3)
In [5553]: %timeit reduce(lambda x, y: x + '_' + y, [df[c].astype(str) for c in col_names])
10 loops, best of 3: 21.7 ms per loop
In [5554]: %timeit reduce(lambda x, y: x.astype(str) + '_' +y.astype(str), [df[x] for x in col_names])
10 loops, best of 3: 22.3 ms per loop
In [5555]: %timeit df[col_names].astype(str).apply('_'.join, axis=1)
1 loop, best of 3: 254 ms per loop
这篇关于Oneliner从多个列创建字符串列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!