本文介绍了按列分割CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个非常大的CSV文件.大约有1700列和40000行,如下所示:
I have a really huge CSV files. There are about 1700 columns and 40000 rows like below:
x,y,z,x1,x2,x3,x4,x5,x6,x7,x8,x9,...(about 1700 more)...,x1700
0,0,0,a1,a2,a3,a4,a5,a6,a7,a8,a9,...(about 1700 more)...,a1700
1,1,1,b1,b2,b3,b4,b5,b6,b7,b8,b9,...(about 1700 more)...,b1700
// (about 40000 more rows below)
我需要将此CSV文件拆分为多个文件,这些文件包含较少的列,例如:
I need to split this CSV file into multiple files which contain a less number of columns like:
# file1.csv
x,y,z
0,0,0
1,1,1
... (about 40000 more rows below)
# file2.csv
x1,x2,x3,x4,x5,x6,x7,x8,x9,...(about 1000 more)...,x1000
a1,a2,a3,a4,a5,a6,a7,a8,a9,...(about 1000 more)...,a1000
b1,b2,b3,b4,b5,b6,b7,b8,b9,...(about 1000 more)...,b1700
// (about 40000 more rows below)
#file3.csv
x1001,x1002,x1003,x1004,x1005,...(about 700 more)...,x1700
a1001,a1002,a1003,a1004,a1005,...(about 700 more)...,a1700
b1001,b1002,b1003,b1004,b1005,...(about 700 more)...,b1700
// (about 40000 more rows below)
是否有任何程序或库正在执行此操作?
Is there any program or library doing this?
我已经用它搜索了,但是我发现的程序只能按行而不是按列拆分文件.
I've googled for it , but programs that I found only split a file by rows not by columns.
或者我可以使用哪种语言有效地做到这一点?
Or which language could I use to do this efficiently?
我可以使用R,shell脚本,Python,C/C ++,Java
I can use R, shell script, Python, C/C++, Java
推荐答案
使用小型python脚本,例如:
Use a small python script like:
fin = 'file_in.csv'
fout1 = 'file_out1.csv'
fout1_fd = open(fout1,'w')
...
lines = []
with open(fin) as fin_fd:
lines = fin_fd.read().split('\n')
for l in lines:
l_arr = l.split(',')
fout1_fd.write(','.join(l_arr[0:3]))
fout1_fd.write('\n')
...
...
fout1_fd.close()
...
这篇关于按列分割CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!