python - python在csv中找到重复项并删除最旧的

我有一个csv文件，里面有这些类型的条目，但是没有标题

abcd,123,2017-09-27 17:38:38
cdfg,324,2017-09-27 18:38:38
abcd,123,2017-09-27 19:38:38
cdfg,423,2017-09-27 16:38:38

我想在第一列找到重复项，它应该删除基于第三列的旧条目，第三列是日期时间格式的？
结果应该是：

abcd,123,2017-09-27 19:38:38
cdfg,423,2017-09-27 16:38:38

有什么想法吗？

最佳答案

使用标准库中的csv模块，您可以执行以下操作：

import csv
from collections import OrderedDict
# you can use a normal dict if the order of the rows does not matter

with open('file.csv') as f:
  r = csv.reader(f)
  d = OrderedDict()
  for row in r:
    if row[0] not in d or d[row[0]][2] < row[2]:
      d[row[0]] = row
d.values()
# [['cdfg', '324', '2017-09-27 18:38:38'], ['abcd', '123', '2017-09-27 19:38:38']]

with open('file_out.csv', 'w') as f:
  w = csv.writer(f)
  w.writerows(d.values())

关于python - python在csv中找到重复项并删除最旧的，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/46466761/