我正在尝试根据一个binned类别对pandas数据框进行子集。(我知道您可以根据值本身进行子集,这只是一个不同问题的表示,我实际上需要对数据进行装箱!)我想我遗漏了一些关于子集的内容,但是在文档中找不到。下面是一个例子:

import numpy as np
import pandas as pd

np.random.seed(9876)

# Generating random data for binning.
bin_step = 0.5
random_data = np.random.uniform(low = 0, high = 10, size = 30)

# Generating bin ranges
bin_ranges = np.arange(start = random_data.min(),
                           stop = random_data.max() + random_data.max()*0.1,
                           step = bin_step)

# Cutting the random data into predefined bins.
bins = pd.cut(random_data.tolist(),
              bin_ranges,
              right = True,
              include_lowest = True)

# Aggregating into a pandas DataFrame
random_data_pd = pd.Series(random_data.tolist(), name = 'values')
bins_transformed = pd.Series(bins, name = 'bins')

df = pd.concat([bins_transformed, random_data_pd], axis = 1)

当对容器进行子集设置时,例如(5.086, 5.586],它将返回所有False。为什么这不是子集?
df.bins == '(5.086, 5.586]' #returns all false.

最佳答案

如果我理解正确的话,原因是你对不同的类型使用了==,而不是pd.Interval。请查查我的例子。

print(type(df.bins[0]))

<class 'pandas._libs.interval.Interval'>

print(df.bins)
print(df.bins == pd.Interval(5.1, 5.2))

0     (1.586, 2.086]
1     (6.086, 6.586]
2     (8.586, 9.086]
3     (7.586, 8.086]
4     (5.086, 5.586]
5     (0.585, 1.086]
6     (4.586, 5.086]
7     (1.086, 1.586]
8     (9.086, 9.586]
9     (4.586, 5.086]
10    (1.586, 2.086]
11    (1.086, 1.586]
12    (2.586, 3.086]
13    (2.586, 3.086]
14    (1.086, 1.586]
15    (8.086, 8.586]
16    (7.086, 7.586]
17    (6.586, 7.086]
18    (8.586, 9.086]
19    (7.586, 8.086]
20    (7.586, 8.086]
21    (0.585, 1.086]
22    (4.586, 5.086]
23    (9.086, 9.586]
24    (8.086, 8.586]
25    (6.586, 7.086]
26    (5.086, 5.586]
27    (6.586, 7.086]
28    (5.086, 5.586]
29    (9.086, 9.586]
Name: bins, dtype: category
Categories (19, interval[float64]): [(0.585, 1.086] < (1.086, 1.586] < (1.586, 2.086] <
                                     (2.086, 2.586] ... (8.086, 8.586] < (8.586, 9.086] <
                                     (9.086, 9.586] < (9.586, 10.086]]
0     False
1     False
2     False
3     False
4      True
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19    False
20    False
21    False
22    False
23    False
24    False
25    False
26     True
27    False
28     True
29    False
Name: bins, dtype: bool

子集。。。
print(df[df.bins == pd.Interval(5.1, 5.2)])

              bins    values
4   (5.086, 5.586]  5.132422
26  (5.086, 5.586]  5.309666
28  (5.086, 5.586]  5.574920

关于python - 基于bin子集pandas DataFrame,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/45726167/

10-14 18:24