在列表列上进行字符串匹配的最佳方法是什么?
例如,我有一个数据集:
import numpy as np
import pandas as pd
list_items = ['apple', 'grapple', 'tackle', 'satchel', 'snapple']
df = pd.DataFrame({'id':xrange(3), 'L':[np.random.choice(list_items, 3).tolist() for _ in xrange(3)]})
df
L id
0 [tackle, apple, grapple] 0
1 [tackle, snapple, satchel] 1
2 [satchel, satchel, tackle] 2
我想返回
L
中任何项与字符串匹配的行,例如,“grap”应该返回行0,“sat”应该返回行1:2。 最佳答案
我们用这个:
np.random.seed(123)
list_items = ['apple', 'grapple', 'tackle', 'satchel', 'snapple']
df = pd.DataFrame({'id':range(3), 'L':[np.random.choice(list_items, 3).tolist() for _ in range(3)]})
df
L id
0 [tackle, snapple, tackle] 0
1 [grapple, satchel, tackle] 1
2 [satchel, grapple, grapple] 2
使用
any
和apply
:df[df.L.apply(lambda x: any('grap' in s for s in x))]
输出:
L id
1 [grapple, satchel, tackle] 1
2 [satchel, grapple, grapple] 2
时间安排:
%timeit df.L.apply(lambda x: any('grap' in s for s in x))
10000圈,最好为3圈:每圈194微秒
%timeit df.L.apply(lambda i: ','.join(i)).str.contains('grap')
1000圈,最好为3:481微秒/圈
%timeit df.L.str.join(', ').str.contains('grap')
1000圈,最好为3:529微秒/圈
关于python - Python:列表的pandas列上的字符串匹配,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/47441980/