问题描述
我有一个尺寸为100*512
的csv文件,我想在spark
中对其进行进一步处理.该文件的问题在于它不包含标题,即column names
.我需要这些列名称以进一步machine learning
中的ETL.我在另一个文件(文本文件)中有列名.我必须将这些列名称作为标题放在上述csv文件中.例如
I have a csv file with the dimensions 100*512
, I want to process it further in spark
. The problem with the file is that it doesn't contain header i.e column names
. I need these column names for further ETL in machine learning
. I have the column names in another file(text file). I have to put these column names as headers in the csv file mentioned above.e.g.
CSV文件:-
hs 6 89 iu 98 adf
hs 6 89 iu 98 adf
gh 7 78 pi 54 ngj
gh 7 78 pi 54 ngj
jh 5 22 kj 78 jdk
jh 5 22 kj 78 jdk
列标题文件:-
我想要这样的输出:-
ab 1 23 sf 23 hjh
ab 1 23 sf 23 hjh
hs 6 89 iu 98 adf
hs 6 89 iu 98 adf
gh 7 78 pi 54 ngj
gh 7 78 pi 54 ngj
jh 5 22 kj 78 jdk
jh 5 22 kj 78 jdk
请提出一些将列标题添加到CSV文件的方法.(而不替换csv文件的行.我通过将其转换为pandas数据框进行了尝试,但无法获得预期的输出.
Please suggest some method to add the column heads to the CSV file.(Without replacing the row of the csv file.I tried it by converting it to pandas dataframe but can't get the expected output.
推荐答案
首先阅读您的csv文件:
First read your csv file:
from pandas import read_csv
df = read_csv('test.csv')
如果数据集中有两列(a列和b列),请使用:
If there are two columns in your dataset(column a, and column b) use:
df.columns = ['a', 'b']
将此新数据帧写入csv
Write this new dataframe to csv
df.to_csv('test_2.csv')
这篇关于将标题添加到csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!