本文介绍了数据框多次移位的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想基于数组nShiftsPerCol中指定的移位次数来重复移位数据帧的选择列。如何生成输出数据帧DFO,其中包含指定了非零移位的列,并且这些列中的每一个移位了多次。注意,第一个移位为零或无移位。

I would like to repeatedly shift select columns of a a dataframe based on the number of shifts specified in array nShiftsPerCol. How to generate the output dataframe DFO that contains the columns with nonzero shifts specified and each of those columns shifted multiple times. Note, the first shift is zero or no shift. Append the shift number to the column name.

import pandas as pd 
import numpy as np 

df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [2, 3, 4, 5, 6], 'C': [3, 4, 5, 6, 7]})
print(df)
nCols = df.shape[0]
nShiftsPerCol = np.zeros(nCols)
nShiftsPerCol[0]=3 # shift column A 3 times
nShiftsPerCol[2]=2 # shift column C 2 times

原始数据框

   A  B  C
0  1  2  3
1  2  3  4
2  3  4  5
3  4  5  6
4  5  6  7

所需的输出

   A_0  A_1  A_2  C_0   C_1
0  1    2    3    3     4
1  2    3    4    4     5
2  3    4    5    5     6
3  4    5    NA   6     7
4  5    NA   NA   7     NA


推荐答案

首先创建系列,并过滤出 0 值:

First create Series with filtering out 0 values:

#for columns need shape[1]
nCols = df.shape[1]
nShiftsPerCol = np.zeros(nCols)
nShiftsPerCol[0]=3 # shift column A 3 times
nShiftsPerCol[2]=2 # shift column C 2 times

print (nShiftsPerCol)

s = pd.Series(nShiftsPerCol, df.columns).astype(int)
s = s[s!=0]
print (s)
A    3
C    2
dtype: int32

,然后循环并创建新列:

and then loop and create new columns:

for i, x in s.items():
    for y in range(x):
        df['{}_{}'.format(i, y)] = df[i].shift(-y)

print (df)
   A  B  C  A_0  A_1  A_2  C_0  C_1
0  1  2  3    1  2.0  3.0    3  4.0
1  2  3  4    2  3.0  4.0    4  5.0
2  3  4  5    3  4.0  5.0    5  6.0
3  4  5  6    4  5.0  NaN    6  7.0
4  5  6  7    5  NaN  NaN    7  NaN

存储列名称和班次编号的另一种解决方案是元组列表:

Another solution for store columns names and shift number is list of tuples:

L = list(zip(df.columns, nShiftsPerCol.astype(int)))
L = [x for x in L if x[1] != 0]
print (L)
[('A', 3), ('C', 2)]

for i, x in L:
    for y in range(x):
        df['{}_{}'.format(i, y)] = df[i].shift(-y)

print (df)
   A  B  C  A_0  A_1  A_2  C_0  C_1
0  1  2  3    1  2.0  3.0    3  4.0
1  2  3  4    2  3.0  4.0    4  5.0
2  3  4  5    3  4.0  5.0    5  6.0
3  4  5  6    4  5.0  NaN    6  7.0
4  5  6  7    5  NaN  NaN    7  NaN

这篇关于数据框多次移位的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-28 12:14