问题描述
我有以下数据框。
,我有一个输入值列表
我要匹配将输入列表添加到数据框中的符号和同义词列,并仅提取输入值出现在符号列或同义词列中的那些行(请注意,此处的值用 |符号分隔)。
I want to match each item from the input list to the Symbol and Synonym column in the data-frame and to extract only those rows where the input value appears in either the Symbol column or Synonym column(Please note that here the values are separated by '|' symbol).
在输出数据帧中,我需要附加一列Input_symbol来表示匹配值。因此,在这种情况下,所需的输出应类似于图像波纹管。
In the output data-frame I need an additional column Input_symbol which denotes the matching value. So here in this case the desired output will should be like the image bellow.
我该怎么做?。
推荐答案
问题已更改。您现在想要做的是浏览两列(符号和同义词),如果您发现 mylist
内部的值,请返回该值。如果不匹配,则可以返回不匹配!。(例如)。
The question has changed. What you want to do now is to look through the two columns (Symbol and Synonyms) and if you find a value that is inside mylist
return it. If no match you can return 'No match!' (for instance).
import pandas as pd
import io
s = '''\
Symbol,Synonyms
A1BG,A1B|ABG|GAB|HYST2477
A2M,A2MD|CPAMD5|FWP007|S863-7
A2MP1,A2MP
NAT1,AAC1|MNAT|NAT-1|NATI
NAT2,AAC2|NAT-2|PNAT
NATP,AACP|NATP1
SERPINA3,AACT|ACT|GIG24|GIG25'''
mylist = ['GAB', 'A2M', 'GIG24']
df = pd.read_csv(io.StringIO(s))
# Store the lookup serie
lookup_serie = df['Symbol'].str.cat(df['Synonyms'],'|').str.split('|')
# Create lambda function to return first value from mylist, No match! if stop-iteration
f = lambda x: next((i for i in x if i in mylist), 'No match!')
df.insert(0,'Input_Symbol',lookup_serie.apply(f))
print(df)
返回
Input_Symbol Symbol Synonyms
0 GAB A1BG A1B|ABG|GAB|HYST2477
1 A2M A2M A2MD|CPAMD5|FWP007|S863-7
2 No match! A2MP1 A2MP
3 No match! NAT1 AAC1|MNAT|NAT-1|NATI
4 No match! NAT2 AAC2|NAT-2|PNAT
5 No match! NATP AACP|NATP1
6 GIG24 SERPINA3 AACT|ACT|GIG24|GIG25
旧解决方案:
f = lambda x: [i for i in x.split('|') if i in mylist] != []
m1 = df['Symbol'].apply(f)
m2 = df['Synonyms'].apply(f)
df[m1 | m2]
这篇关于选择与正则表达式匹配的 pandas 行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!