我的数据框中有一个列,它的值在2100到8000之间。我想把这一列分成几列,每列间隔500。所以让我举个例子:

   column
    2100
    2105
    2119
     .
    8000

我想这样分开:
   column1   column2 column3  .   .  column n
    2100        0       0     .   .    0
      .         0       .     .   .    0
    2600        0       0
              2601      0     .   .    .
                .       .
              3101      0
                       3102            0
                        .
                       3602
                                      8000

请提出解决办法。

最佳答案

以下是一种使用pd.cutDataFrame.pivot的方法:

df = pd.DataFrame(list(range(2100, 8000+1)), columns=['column'])
# create the bins to be used in pd.cut
bins = list(range(df.column.min(), df.column.max()+50, 50))
# array([2100, 2150, 2200, 2250, 2300 ...
# Create the labels for pd.cut, which will be used as column names
labels = [f'column{i}' for i in range(len(bins)-1)]
# ['column0', 'column1', 'column2', 'column3', 'column4', ...
df['bins'] = pd.cut(df.column, bins, labels=labels, include_lowest=True)

它会给你:
      column       bins
0       2100    column0
1       2101    column0
2       2102    column0
3       2103    column0
4       2104    column0
5       2105    column0
6       2106    column0
7       2107    column0
8       2108    column0

现在使用pivot获得最终结果:
ix = df.groupby('bins').column.cumcount()
df.pivot(columns = 'bins', index=ix).fillna(0)

bins column0 column1 column2 column3 column4 column5 column6 column7 column8  ...
0     2100.0  2151.0  2201.0  2251.0  2301.0  2351.0  2401.0  2451.0  2501.0
1     2101.0  2152.0  2202.0  2252.0  2302.0  2352.0  2402.0  2452.0  2502.0
2     2102.0  2153.0  2203.0  2253.0  2303.0  2353.0  2403.0  2453.0  2503.0
3     2103.0  2154.0  2204.0  2254.0  2304.0  2354.0  2404.0  2454.0  2504.0
4     2104.0  2155.0  2205.0  2255.0  2305.0  2355.0  2405.0  2455.0  2505.0
5     2105.0  2156.0  2206.0  2256.0  2306.0  2356.0  2406.0  2456.0  2506.0
6     2106.0  2157.0  2207.0  2257.0  2307.0  2357.0  2407.0  2457.0  2507.0
7     2107.0  2158.0  2208.0  2258.0  2308.0  2358.0  2408.0  2458.0  2508.0
8     2108.0  2159.0  2209.0  2259.0  2309.0  2359.0  2409.0  2459.0  2509.0
9     2109.0  2160.0  2210.0  2260.0  2310.0  2360.0  2410.0  2460.0  2510.0
10    2110.0  2161.0  2211.0  2261.0  2311.0  2361.0  2411.0  2461.0  2511.0
...

让我们将其全部封装在一个函数中,并尝试使用一个更简单的示例来更好地了解其工作原理:
def binning_and_pivot(df, bin_size):
    bins = list(range(df.column.min(), df.column.max()+bin_size, bin_size))
    labels = [f'column{i}' for i in range(len(bins)-1)]
    df['bins'] = pd.cut(df.column, bins, labels=labels, include_lowest=True)
    ix = df.groupby('bins').column.cumcount()
    return df.pivot(columns = 'bins', index=ix).fillna(0)

df = pd.DataFrame(list(range(100+1)), columns=['column'])
df = df.sample(frac=0.7).reset_index(drop=True)
binning_and_pivot(df, bin_size=10)

bins column0 column1 column2 column3 column4 column5 column6 column7 column8
0        2.0    16.0    32.0    39.0    45.0    55.0    69.0    81.0    87.0
1        6.0    21.0    29.0    42.0    46.0    59.0    72.0    76.0    92.0
2        3.0    13.0    31.0    36.0    49.0    61.0    68.0    74.0    91.0
3       12.0    20.0    25.0    41.0    52.0    56.0    70.0    78.0    86.0
4        8.0    17.0    30.0    37.0    43.0    62.0    64.0    73.0    89.0
5        7.0    19.0    27.0    38.0    50.0    53.0    71.0    77.0    83.0
6        0.0    22.0    28.0     0.0     0.0    54.0    65.0    82.0    90.0
7        0.0    18.0    24.0     0.0     0.0    60.0    63.0    80.0     0.0
8        0.0    14.0    26.0     0.0     0.0     0.0     0.0    75.0     0.0


bins column9
0       95.0
1      100.0
2       96.0
3        0.0
4        0.0
5        0.0
6        0.0
7        0.0
8        0.0

关于python - Python根据范围将一列分为多列,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/56020778/

10-12 21:04