本文介绍了按列分割CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大的CSV文件.大约有1700列和40000行,如下所示:

I have a really huge CSV files. There are about 1700 columns and 40000 rows like below:

x,y,z,x1,x2,x3,x4,x5,x6,x7,x8,x9,...(about 1700 more)...,x1700
0,0,0,a1,a2,a3,a4,a5,a6,a7,a8,a9,...(about 1700 more)...,a1700
1,1,1,b1,b2,b3,b4,b5,b6,b7,b8,b9,...(about 1700 more)...,b1700
// (about 40000 more rows below)

我需要将此CSV文件拆分为多个文件,这些文件包含较少的列,例如:

I need to split this CSV file into multiple files which contain a less number of columns like:

# file1.csv
x,y,z
0,0,0
1,1,1
... (about 40000 more rows below)

# file2.csv
x1,x2,x3,x4,x5,x6,x7,x8,x9,...(about 1000 more)...,x1000
a1,a2,a3,a4,a5,a6,a7,a8,a9,...(about 1000 more)...,a1000
b1,b2,b3,b4,b5,b6,b7,b8,b9,...(about 1000 more)...,b1700
// (about 40000 more rows below)

#file3.csv
x1001,x1002,x1003,x1004,x1005,...(about 700 more)...,x1700
a1001,a1002,a1003,a1004,a1005,...(about 700 more)...,a1700
b1001,b1002,b1003,b1004,b1005,...(about 700 more)...,b1700
// (about 40000 more rows below)

是否有任何程序或库正在执行此操作?

Is there any program or library doing this?

我已经用它搜索了,但是我发现的程序只能按行而不是按列拆分文件.

I've googled for it , but programs that I found only split a file by rows not by columns.

或者我可以使用哪种语言有效地做到这一点?

Or which language could I use to do this efficiently?

我可以使用R,shell脚本,Python,C/C ++,Java

I can use R, shell script, Python, C/C++, Java

推荐答案

使用小型python脚本,例如:

Use a small python script like:

fin = 'file_in.csv'
fout1 = 'file_out1.csv'
fout1_fd = open(fout1,'w')
...
lines = []

with open(fin) as fin_fd:
   lines = fin_fd.read().split('\n')

for l in lines:
   l_arr = l.split(',')
   fout1_fd.write(','.join(l_arr[0:3]))
   fout1_fd.write('\n')
   ...

...
fout1_fd.close()
...

这篇关于按列分割CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 09:47