问题描述
因此,我的google-fu似乎并没有使我看起来应该是一个微不足道的程序.
So my google-fu doesn't seem to be doing me justice with what seems like should be a trivial procedure.
在适用于Python的Pandas中,我有2个数据集,我想将它们合并.使用.concat可以正常工作.问题是,.concat对我的列进行重新排序.从数据检索的角度来看,这是微不足道的.从我只想打开文件并快速查看最重要的列"的角度来看,这很烦人.
In Pandas for Python I have 2 datasets, I want to merge them. This works fine using .concat. The issue is, .concat reorders my columns. From a data retrieval point of view, this is trivial. From a "I just want to open the file and quickly see the most important column" point of view, this is annoying.
File1.csv
Name Username Alias1
Tom Tomfoolery TJZ
Meryl MsMeryl Mer
Timmy Midsize Yoda
File2.csv
Name Username Alias 1 Alias 2
Bob Firedbob Fire Gingy
Tom Tomfoolery TJZ Awww
Result.csv
Alias1 Alias2 Name Username
0 TJZ NaN Tom Tomfoolery
1 Mer NaN Meryl MsMeryl
2 Yoda NaN Timmy Midsize
0 Fire Gingy Bob Firedbob
1 TJZ Awww Tom Tomfoolery
结果很好,但是在我正在使用的数据文件中,我有1,000列.现在最重要的2-3个位于中间.有没有办法,在这个玩具示例中,我可以将用户名"强制为第一列,而将名称"强制为第二列,显然将每个以下的值都保留下来.
The result is fine, but in the data-file I'm working with I have 1,000 columns. The 2-3 most important are now in the middle. Is there a way, in this toy example, I could've forced "Username" to be the first column and "Name" to be the second column, preserving the values below each all the way down obviously.
此外,当我保存到文件时,它还将该编号保存在侧面(0 1 2 0 1).如果也有办法防止这种情况发生,那将很酷.如果不是这样,那么就没什么大不了的,因为可以快速删除.
Also as a side note, when I save to file it also saves that numbering on the side (0 1 2 0 1). If theres a way to prevent that too, that'd be cool. If not, its not a big deal since it's a quick fix to remove.
谢谢!
推荐答案
假设串联的DataFrame为df
,则可以按以下方式对列进行重新排序:
Assuming the concatenated DataFrame is df
, you can perform the reordering of columns as follows:
important = ['Username', 'Name']
reordered = important + [c for c in df.columns if c not in important]
df = df[reordered]
print df
输出:
Username Name Alias1 Alias2
0 Tomfoolery Tom TJZ NaN
1 MsMeryl Meryl Mer NaN
2 Midsize Timmy Yoda NaN
0 Firedbob Bob Fire Gingy
1 Tomfoolery Tom TJZ Awww
数字列表[0, 1, 2, 0, 1]
是DataFrame的索引.为了防止将它们写入输出文件,可以在to_csv()
中使用index=False
选项:
The list of numbers [0, 1, 2, 0, 1]
is the index of the DataFrame. To prevent them from being written to the output file, you can use the index=False
option in to_csv()
:
df.to_csv('Result.csv', index=False, sep=' ')
这篇关于保留列顺序-Python Pandas和列Concat的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!