选择与正则表达式匹配的 pandas 行

本文介绍了选择与正则表达式匹配的 pandas 行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下数据框。

，我有一个输入值列表

我要匹配将输入列表添加到数据框中的符号和同义词列，并仅提取输入值出现在符号列或同义词列中的那些行（请注意，此处的值用 |符号分隔）。

I want to match each item from the input list to the Symbol and Synonym column in the data-frame and to extract only those rows where the input value appears in either the Symbol column or Synonym column(Please note that here the values are separated by '|' symbol).

在输出数据帧中，我需要附加一列Input_symbol来表示匹配值。因此，在这种情况下，所需的输出应类似于图像波纹管。

In the output data-frame I need an additional column Input_symbol which denotes the matching value. So here in this case the desired output will should be like the image bellow.

我该怎么做？。

推荐答案

问题已更改。您现在想要做的是浏览两列（符号和同义词），如果您发现 mylist 内部的值，请返回该值。如果不匹配，则可以返回不匹配！。（例如）。

The question has changed. What you want to do now is to look through the two columns (Symbol and Synonyms) and if you find a value that is inside mylist return it. If no match you can return 'No match!' (for instance).

import pandas as pd
import io

s = '''\
Symbol,Synonyms
A1BG,A1B|ABG|GAB|HYST2477
A2M,A2MD|CPAMD5|FWP007|S863-7
A2MP1,A2MP
NAT1,AAC1|MNAT|NAT-1|NATI
NAT2,AAC2|NAT-2|PNAT
NATP,AACP|NATP1
SERPINA3,AACT|ACT|GIG24|GIG25'''

mylist = ['GAB', 'A2M', 'GIG24']
df = pd.read_csv(io.StringIO(s))

# Store the lookup serie
lookup_serie = df['Symbol'].str.cat(df['Synonyms'],'|').str.split('|')

# Create lambda function to return first value from mylist, No match! if stop-iteration
f = lambda x: next((i for i in x if i in mylist), 'No match!')

df.insert(0,'Input_Symbol',lookup_serie.apply(f))
print(df)

  Input_Symbol    Symbol                   Synonyms
0          GAB      A1BG       A1B|ABG|GAB|HYST2477
1          A2M       A2M  A2MD|CPAMD5|FWP007|S863-7
2    No match!     A2MP1                       A2MP
3    No match!      NAT1       AAC1|MNAT|NAT-1|NATI
4    No match!      NAT2            AAC2|NAT-2|PNAT
5    No match!      NATP                 AACP|NATP1
6        GIG24  SERPINA3       AACT|ACT|GIG24|GIG25

旧解决方案：

f = lambda x: [i for i in x.split('|') if i in mylist] != []

m1 = df['Symbol'].apply(f)
m2 = df['Synonyms'].apply(f)

df[m1 | m2]

这篇关于选择与正则表达式匹配的 pandas 行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！