我有一段下面提到的python代码,但是没有返回我想要的。和类似此示例的文件:

AAAS,ENST00000552161,1.70232E-30
AAAS,ENST00000548258,1.09222E-84
AAAS,ENST00000549450,1.3171E-108
AAAS,ENST00000209873,22.3297
AAAS,ENST00000546562,0.170807
AAAS,ENST00000394384,5.53609
AAAS,ENST00000547238,0.829774
AACS,ENST00000316543,0.49901
AACS,ENST00000261686,2.41428


我在第一栏有很多重复的项目。我只想在第三列中选择一个。像以下几行:

AAAS,ENST00000209873,22.3297
AACS,ENST00000261686,2.41428


这是代码:

import csv
from collections import defaultdict

with open('data.csv', newline='') as f, open('out.csv', 'w', newline='') as out:
    f_reader = csv.reader(f)
    out_writer = csv.writer(out)
    d = defaultdict(list)
    for line in f_reader:
        d[line[1]].append(line)
    for _,v in d.items():
        new_line = sorted(v, key=lambda i:float(i[2]), reverse=True)[0]
        out_writer.writerow(new_line)


你知道是什么问题吗?

最佳答案

对于熊猫来说,这是一个完美的问题:

import pandas as pd
df = pd.read_csv('data.csv',header=None)
df.groupby(0).max()
#                    1         2
#0
#AAAS  ENST00000552161  22.32970
#AACS  ENST00000316543   2.41428

10-08 10:50