本文介绍了&一个布尔数据帧列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个熊猫数据帧:

将pandas导入为pd将 numpy 导入为 np来自随机进口样本,randrange从 functools 导入减少N = 200df = pd.DataFrame({'Rating': np.random.choice(range(100), N),'治疗':np.random.choice(range(1, 10), N),'试用':np.random.choice(range(1, 20), N),'名称': np.random.choice(list("ABCDEF"), N),'目标': np.random.choice(list("JKLMNOP"), N),'部分':np.random.choice(list("WXYZ"), N),})

在我的应用程序中,用户可以进行选择,但现在让我们选择一些随机值:

>>>category = [sorted(df[column].unique()) for column in df.columns.values]>>>打印(类别)[['A', 'B', 'C', 'D', 'E', 'F'], ['W', 'X', 'Y', 'Z'], [0, 1,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28,29, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 50, 51, 52, 53, 54, 556、57、58、59、60、61、62、64、65、66、67、68、69、70、71、72、73、74、75、78、79、80、82、54、8487, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99], ['J', 'K', 'L', 'M', 'N', 'O', 'P'], [1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12、13、14、15、16、17、18、19]]>>>selected = [sample(category, k=3) for category in category]>>>打印(已选择)[['F', 'D', 'C'], ['X', 'Z', 'W'], [36, 35, 16], ['O', 'N', 'P'], [8, 1, 9], [7, 11, 8]]

现在我想在我的 DataFrame 中选择行,其中对于每一列,单元格的值都在选择中.我想到的是:

>>>df[reduce((lambda x, y: x & y), [df[column].isin(selection) for (column, selection) in zip(df.columns.values, selected)])]名称零件评级目标治疗试验173 D Z 35 O 9 7

这行得通,但看起来不太像pythonic.有没有更好的方法来做到这一点?

解决方案

你可以使用numpy.logical_and.reduce:

df[np.logical_and.reduce([df[i].isin(j) for i, j in zip(df.columns, selected)])]

这将布尔数组列表缩减为一个用于索引的布尔数组.

I have a pandas DataFrame:

import pandas as pd
import numpy as np
from random import sample, randrange
from functools import reduce

N = 200
df = pd.DataFrame({'Rating':    np.random.choice(range(100), N),
                   'Treatment': np.random.choice(range(1, 10), N),
                   'Trial':     np.random.choice(range(1, 20), N),
                   'Name':      np.random.choice(list("ABCDEF"), N),
                   'Target':    np.random.choice(list("JKLMNOP"), N),
                   'Part':      np.random.choice(list("WXYZ"), N),
                   })

In my application, the user can make a selection, but for now let's select some random values:

>>> categories = [sorted(df[column].unique()) for column in df.columns.values]
>>> print(categories)
[['A', 'B', 'C', 'D', 'E', 'F'], ['W', 'X', 'Y', 'Z'], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 29, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 78, 79, 80, 82, 84, 85, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99], ['J', 'K', 'L', 'M', 'N', 'O', 'P'], [1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]
>>> selected = [sample(category, k=3) for category in categories]
>>> print(selected)
[['F', 'D', 'C'], ['X', 'Z', 'W'], [36, 35, 16], ['O', 'N', 'P'], [8, 1, 9], [7, 11, 8]]

Now I want to select the row(s) in my DataFrame where, for each column, the value of the cell is in the selection. What I came up with is:

>>> df[reduce((lambda x, y: x & y), [df[column].isin(selection) for (column, selection) in zip(df.columns.values, selected)])]
    Name Part  Rating Target  Treatment  Trial
173    D    Z      35      O          9      7

This works, but it doesn't look very pythonic. Is there a better way to do this?

解决方案

You can use numpy.logical_and.reduce:

df[np.logical_and.reduce([df[i].isin(j) for i, j in zip(df.columns, selected)])]

This reduces a list of Boolean arrays into a single Boolean array for indexing.

这篇关于&一个布尔数据帧列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-31 15:04