我有一个DataFrame,其中某些商品的价格范围:

   price_low  price_high item
0         10          20    a
1          1           7    b
2         10          12    c
3         20          25    d
4          4           8    e
5          5          30    f
6         16          26    g


如何确定给定价格范围(例如$ 8-$ 16)的哪些商品重叠?

预期产量:

   price_low  price_high item
0         10          20    a
2         10          12    c
4          4           8    e
5          5          30    f
6         16          26    g

最佳答案

从熊猫0.24.0开始,您可以使用IntervalArray.overlaps方法,或者使用IntervalIndex.overlaps方法:

# construct the IntervalArray
price_ivs = pd.arrays.IntervalArray.from_arrays(df['price_low'], df['price_high'], closed='both')

# define desired price Interval and use the overlaps method to restrict df
my_price = pd.Interval(8, 16, closed='both')
df = df[price_ivs.overlaps(my_price)]


为了进行交互式演示,请首先构造示例数据:

In [1]: import pandas as pd; pd.__version__
Out[1]: '0.24.0rc1'

In [2]: df = pd.DataFrame({
   ...:     'price_low': [10, 1, 10, 20, 4, 5, 16],
   ...:     'price_high': [20, 7, 12, 25, 8, 30, 26],
   ...:     'item': list('abcdefg')
   ...: })

In [3]: df
Out[3]:
   price_low  price_high item
0         10          20    a
1          1           7    b
2         10          12    c
3         20          25    d
4          4           8    e
5          5          30    f
6         16          26    g


IntervalArray构造一个DataFrame

In [4]: price_ivs = pd.arrays.IntervalArray.from_arrays(
   ...:     df['price_low'], df['price_high'], closed='both')

In [5]: price_ivs
Out[5]:
IntervalArray([[10, 20], [1, 7], [10, 12], [20, 25], [4, 8], [5, 30], [16, 26]],
              closed='both',
              dtype='interval[int64]')


定义所需价格Interval并使用overlaps方法获取布尔索引器:

In [6]: my_price = pd.Interval(8, 16, closed='both')

In [7]: idxr = price_ivs.overlaps(my_price)

In [8]: idxr
Out[8]: array([ True, False,  True, False,  True,  True,  True])

In [9]: df[idxr]
Out[9]:
   price_low  price_high item
0         10          20    a
2         10          12    c
4          4           8    e
5          5          30    f
6         16          26    g


如果您已经有一列价格作为间隔(或间隔的Series),则可以使用array属性访问基础的IntervalArray并使用与上述相同的方法:

In [10]: df = pd.DataFrame({'price_ivs': price_ivs, 'item': list('abcdefg')})

In [11]: df
Out[11]:
  price_ivs item
0  [10, 20]    a
1    [1, 7]    b
2  [10, 12]    c
3  [20, 25]    d
4    [4, 8]    e
5   [5, 30]    f
6  [16, 26]    g

In [12]: idxr = df['price_ivs'].array.overlaps(my_price)

In [13]: idxr
Out[13]: array([ True, False,  True, False,  True,  True,  True])

In [14]: df[idxr]
Out[14]:
  price_ivs item
0  [10, 20]    a
2  [10, 12]    c
4    [4, 8]    e
5   [5, 30]    f
6  [16, 26]    g

08-25 02:11