最后编辑:成功了感谢大家的帮助,特别感谢帕德雷克在我工作之前对我的帮助。
首先,如果之前有人问过这个问题,我很抱歉,我确实做了大量的搜索,但也许是用了一种我没想到的方式。
因此,我使用的csv文件如下:0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5,21171,7.25,S
我必须解析这个文件,然后将它的一部分写入另一个csv中,我用这段代码完成了这个任务:
import csv
infile = open('data/data.csv', 'r')
incsv = csv.reader(infile, delimiter = ',')
outfile = open('data/output.csv', 'w', newline = '')
outcsv = csv.writer(outfile, delimiter = ',')
问题是字段“name”的格式设置为
"Lastname, othernames"
,我需要将其拆分为两个字段:“lastname”和“othernames”。我似乎找不到方法让它忽略引号并用分隔符(',')分隔名称这是一个列表,所以.strip()不起作用,我也无法确定quote\u none是否起作用,或者我只是没有把语法写下来。
也许不用说,但我对这一切都很陌生。
编辑:我在这些解决方案中遇到了错误,所以我将包括其余的代码,希望它能突出显示出哪里出了问题。
import csv
infile = open('data/titanic.csv', 'r')
incsv = csv.reader(infile, delimiter = ',')
outfile = open('data/survivors.csv', 'w', newline = '')
outcsv = csv.writer(outfile, delimiter = ',')
dict ={}
for row in incsv:
survived, pclass, name, sex, age, sibsp, parch, ticket, fare, cabin, embarked = row
if survived == "1":
if name not in dict:
dict[name] = name, pclass, sex, age
names = dict.keys()
sorted_names = sorted(names)
for name in sorted_names:
(name, pclass, sex, age) = dict[name]
rowOutput = (name, pclass, sex, age)
outcsv.writerow(rowOutput)
outfile.close()
infile.close()
因此,这将解析原始csv,filters by survived='1',将名称添加到dict中(我知道,一旦拆分name字段,我将需要调整此值),并按字母顺序对字典进行排序。
编辑:这里是更多的原始文件的要求。很抱歉最初没有包括更多。
survived,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S
0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S
0,3,"Palsson, Master. Gosta Leonard",male,2,3,1,349909,21.075,,S
1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,,S
这是10行892(如果不算页眉的话是891)。
最佳答案
如果数据始终在同一列中,则可以拆分:
In [20]: s = '0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5,21171,7.25,S'
In [21]: import csv
In [22]: row = (next(csv.reader([s])))
In [23]:row
['0', '3', 'Braund, Mr. Owen Harris', 'male', '22', '1', '0', 'A/5', '21171', '7.25', 'S']
In [24]: last,first = row[2].split(",")
In [25]: last, first.strip()
Out[25]: ('Braund', 'Mr. Owen Harris')
假设你想用姓氏作为主键:
from operator import itemgetter
dct = {}
with open('data/titanic.csv') as infile, open('data/survivors.csv', 'w', newline='') as outfile:
incsv = csv.reader(infile)
outcsv = csv.writer(outfile)
for survived, pclass, name, sex, age in map(itemgetter(0,1, 2, 3, 4), incsv):
if survived == "1":
last, first = name.split(",")
dct[last] = [first, pclass, sex, age]
sorted_names = sorted(dct)
for last_name in sorted_names:
outcsv.writerow( [last_name] + dct[last_name])
itemgetter(0,1,2,3,4)
只提取我们感兴趣的前五列,在for循环中解压这五个值,拆分名称并使用姓氏作为键。如果可能缺少名字,可以使用str.partition:
last, _, first = name.partition(",")
dct[last] = first.strip(), pclass, sex, age
最终输出格式为:
last_name, other_names, plcass, sex, age
采样线上的输出:
In [2]: cat test.csv
1,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5,21171,7.25,S
0,3,"Braund1, Mr. Owen Harris",male,22,1,0,A/5,21171,7.25,S
1,3,"Braund3, Mr. Owen2 Harris2",male,22,1,0,A/5,21171,7.25,S
0,3,"Braund2, Mr. Owen2 Harris2",male,22,1,0,A/5,21171,7.25,S
In [3]: cat survivors.csv
In [4]: paste
from operator import itemgetter
import csv
dct = {}
with open('test.csv') as infile, open('survivors.csv', 'w', newline='') as outfile:
incsv = csv.reader(infile)
outcsv = csv.writer(outfile)
for survived, pclass, name, sex, age in map(itemgetter(0, 1, 2, 3, 4), incsv):
if survived == "1":
last, first = name.split(",")
dct[last] = [first, pclass, sex, age]
sorted_names = sorted(dct)
for last_name in sorted_names:
outcsv.writerow([last_name] + dct[last_name])
## -- End pasted text --
In [5]: cat survivors.csv
Braund,Mr. Owen Harris,3,male,22
Braund3,Mr. Owen2 Harris2,3,male,22