使用 python 脚本从 csv 文件中删除重复的行

本文介绍了使用 python 脚本从 csv 文件中删除重复的行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

目标

我从 hotmail 下载了一个 CSV 文件，但里面有很多重复的文件.这些副本是完整的副本，我不知道为什么我的手机会创建它们.

我想去掉重复项.

方法

编写一个python脚本来删除重复项.

技术规范

视窗 XP SP 3蟒蛇 2.7包含 400 个联系人的 CSV 文件

解决方案

UPDATE: 2016

如果您乐于使用有用的more_itertools 外部库:

from more_itertools import unique_everseen以 open('1.csv','r') 作为 f, open('2.csv','w') 作为 out_file:out_file.writelines(unique_everseen(f))

@IcyFlame 解决方案的更高效版本

 with open('1.csv','r') as in_file, open('2.csv','w') as out_file:see = set() # 设置为快速 O(1) 分摊查找对于 in_file 中的行:如果看到一行:继续#跳过重复看到.添加(行)out_file.write(行)

要就地编辑相同的文件，您可以使用它

导入文件输入see = set() # 设置为快速 O(1) 分摊查找对于 fileinput.FileInput('1.csv', inplace=1) 中的行:如果看到一行:继续#跳过重复看到.添加(行)打印行，#标准输出现在重定向到文件

Goal

I have downloaded a CSV file from hotmail, but it has a lot of duplicates in it. These duplicates are complete copies and I don't know why my phone created them.

I want to get rid of the duplicates.

Approach

Write a python script to remove duplicates.

Technical specification

Windows XP SP 3
Python 2.7
CSV file with 400 contacts

解决方案

UPDATE: 2016

If you are happy to use the helpful more_itertools external library:

from more_itertools import unique_everseen
with open('1.csv','r') as f, open('2.csv','w') as out_file:
    out_file.writelines(unique_everseen(f))

A more efficient version of @IcyFlame's solution

with open('1.csv','r') as in_file, open('2.csv','w') as out_file:
    seen = set() # set for fast O(1) amortized lookup
    for line in in_file:
        if line in seen: continue # skip duplicate

        seen.add(line)
        out_file.write(line)

To edit the same file in-place you could use this

import fileinput
seen = set() # set for fast O(1) amortized lookup
for line in fileinput.FileInput('1.csv', inplace=1):
    if line in seen: continue # skip duplicate

    seen.add(line)
    print line, # standard output is now redirected to the file

这篇关于使用 python 脚本从 csv 文件中删除重复的行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！