我有一段下面提到的python代码,但是没有返回我想要的。和类似此示例的文件:
AAAS,ENST00000552161,1.70232E-30
AAAS,ENST00000548258,1.09222E-84
AAAS,ENST00000549450,1.3171E-108
AAAS,ENST00000209873,22.3297
AAAS,ENST00000546562,0.170807
AAAS,ENST00000394384,5.53609
AAAS,ENST00000547238,0.829774
AACS,ENST00000316543,0.49901
AACS,ENST00000261686,2.41428
我在第一栏有很多重复的项目。我只想在第三列中选择一个。像以下几行:
AAAS,ENST00000209873,22.3297
AACS,ENST00000261686,2.41428
这是代码:
import csv
from collections import defaultdict
with open('data.csv', newline='') as f, open('out.csv', 'w', newline='') as out:
f_reader = csv.reader(f)
out_writer = csv.writer(out)
d = defaultdict(list)
for line in f_reader:
d[line[1]].append(line)
for _,v in d.items():
new_line = sorted(v, key=lambda i:float(i[2]), reverse=True)[0]
out_writer.writerow(new_line)
你知道是什么问题吗?
最佳答案
对于熊猫来说,这是一个完美的问题:
import pandas as pd
df = pd.read_csv('data.csv',header=None)
df.groupby(0).max()
# 1 2
#0
#AAAS ENST00000552161 22.32970
#AACS ENST00000316543 2.41428