python - 在Pandas中为多个列过滤DataFrame，其中列名称包含模式

在过滤多列时，我看到了一些示例，其中我们可以使用类似df[df['A'].str.contains("string") | df['B'].str.contains("string")]的方法来过滤行。

我有多个文件，我想在其中获取每个文件，并从其中具有'gmail.com'字符串的列名称中仅获取那些具有'email'的行。

因此，示例标头可以像这样：'firstname''lastname''companyname''address''emailid1''emailid2''emailid3'...

emailid1..2..3列的电子邮件ID包含gmail.com。我想获取在其中任何一个位置都可能出现gmail的行。

for file in files:
    pdf = pd.read_csv('Reduced/'+file,delimiter = '\t')
    emailids = [col for col in pdf.columns if 'email' in col]
    #  pdf['gmail' in pdf[emailids]]

最佳答案

给定示例输入：

df = pd.DataFrame({'email': ['[email protected]', '[email protected]'], 'somethingelse': [1, 2], 'another_email': ['[email protected]', '[email protected]']})

例如：

           another_email              email  somethingelse
0   [email protected]   [email protected]              1
1  [email protected]  [email protected]              2

您可以过滤出包含电子邮件的列，查找gmail.com或所需的任何文本，然后查找子集，例如：

df[df.filter(like='email').applymap(lambda L: 'gmail.com' in L).any(axis=1)]

这给你：

           another_email              email  somethingelse
1  [email protected]  [email protected]              2

关于python - 在Pandas中为多个列过滤DataFrame，其中列名称包含模式，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/39348317/