问题描述
我从大约120列开始.总是属于彼此的三列.与其并排放置120列,不应该将它们堆叠在一起,因此我们最终得到了三列.这已经解决了(请参阅上面的链接).
I start out with about 120 columns. It is always three columns that belong to each other. Instead of being 120 columns side by side, they should be stacked on top of each other, so we end up with three columns. This has already been solved (see link above).
样本数据:
df = pd.DataFrame({
"1": np.random.randint(900000000, 999999999, size=5),
"2": np.random.choice( ["A","B","C", np.nan], 5),
"3": np.random.choice( [np.nan, 1], 5),
"4": np.random.randint(900000000, 999999999, size=5),
"5": np.random.choice( ["A","B","C", np.nan], 5),
"6": np.random.choice( [np.nan, 1], 5)
})
Jezrael建议的第一个问题的工作解决方案:
Working solution for initial question as suggested by Jezrael:
arr = np.arange(len(df.columns))
df.columns = [arr // 3, arr % 3]
df = df.stack(0).sort_index(level=[1, 0]).reset_index(drop=True)
df.columns = ['A','B','C']
这改变了这个:
1 2 3 4 5 6
0 960189042 B NaN 991581392 A 1.0
1 977655199 nan 1.0 964195250 A 1.0
2 961771966 A NaN 969007327 B 1.0
3 955308022 C 1.0 973316485 A NaN
4 933277976 A 1.0 976749175 A NaN
对此:
A B C
0 960189042 B NaN
1 977655199 nan 1.0
2 961771966 A NaN
3 955308022 C 1.0
4 933277976 A 1.0
5 991581392 A 1.0
6 964195250 A 1.0
7 969007327 B 1.0
8 973316485 A NaN
9 976749175 A NaN
后续问题:现在,如果我需要一个指标,每个区块来自哪个三元组,那该怎么办呢?因此结果可能看起来像:
Follow Up Question:Now, if I'd need an indicator from which triple each block comes from, how could this be done? So a result could look like:
A B C D
0 960189042 B NaN 0
1 977655199 nan 1.0 0
2 961771966 A NaN 0
3 955308022 C 1.0 0
4 933277976 A 1.0 0
5 991581392 A 1.0 1
6 964195250 A 1.0 1
7 969007327 B 1.0 1
8 973316485 A NaN 1
9 976749175 A NaN 1
这些块的长度可以不同!所以我不能简单地添加一个计数器.
These blocks can be of different lengths! So I cannot simply add a counter.
推荐答案
使用 reset_index
仅删除第一级,第二级MultiIndex
转换为列:
arr = np.arange(len(df.columns))
df.columns = [arr // 3, arr % 3]
df = df.stack(0).sort_index(level=[1, 0]).reset_index(level=0, drop=True).reset_index()
df.columns = ['D','A','B','C']
print (df)
D A B C
0 0 960189042 B NaN
1 0 977655199 nan 1.0
2 0 961771966 A NaN
3 0 955308022 C 1.0
4 0 933277976 A 1.0
5 1 991581392 A 1.0
6 1 964195250 A 1.0
7 1 969007327 B 1.0
8 1 973316485 A NaN
9 1 976749175 A NaN
然后,如果需要更改列的顺序:
Then if need change order of columns:
cols = df.columns[1:].tolist() + df.columns[:1].tolist()
df = df[cols]
print (df)
A B C D
0 960189042 B NaN 0
1 977655199 nan 1.0 0
2 961771966 A NaN 0
3 955308022 C 1.0 0
4 933277976 A 1.0 0
5 991581392 A 1.0 1
6 964195250 A 1.0 1
7 969007327 B 1.0 1
8 973316485 A NaN 1
9 976749175 A NaN 1
这篇关于将任何其他列追加到前三列,并指明它来自的三列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!