删除整行的重复项

本文介绍了删除整行的重复项的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述 29岁程序员，3月因学历无情被辞！大家好，我有数千行，有106列。第一列（染色体和位置）只包含染色体和位置，但可以复制，而其余列的范围为1-105，其中它对应于样品编号。如果样本具有某个染色体和位置，那么我想在该单元格中添加第一个，以便最后我将计算其中包含一个样本的每个样本的总和。我难以在Python中编程的问题是，如果相同的键出现在不同的样本中不止一次，我该如何将其写入文件。如何将第一个添加到该单元格中，以便稍后我可以获得总和。提前多多谢谢，到目前为止我的代码如下： with open（os.path.join（file_out + .txt ），' w'） as outpt： dic = defaultdict（list） dic [chro_pos] .append（sample_num） outpt.write（ chrom_pos + \t + \t .join（ samp_num）+ \t + \ n） for k，val in dic.iteritems（）：＃ k是染色体：位置。 val是样本编号1 out 105 v in val： outpt_TSS.write（int（k）*（ \t）+ str（ 1 ）+ ' \ n' ）＃这将有重复的chrome_pos，我不希望这样，我想要一个chrome_pos，其编号对应多个样本。解决方案将val写入新数组，然后验证该列表中是否已存在，然后跳过。 Hi guys,I have a thousands rows with 106 columns. The first column (chromosome and location) just contains a chromosome and location but can be duplicated whereas the rest of the columns range from 1-105 in which it correspond to the sample number. If the sample has a certain chromosome and location then, I want to add the number one to that cell so that at the end I will calculate the sum of each sample that has one in it. The problem I am having tough time to program in Python is how can I write this to a file if the same key appear more than once of different sample. How can I add the number one to that cell so I can get the sum later on.Thanks a lot in advance,The code I have so far is found below: with open(os.path.join(file_out+".txt"),'w') as outpt: dic = defaultdict(list) dic[chro_pos].append(sample_num) outpt.write("chrom_pos"+"\t"+"\t".join(samp_num)+ "\t"+"\n") for k ,val in dic.iteritems(): # k is the chromosome:location. val is the sample number 1 out 105 for v in val: outpt_TSS.write(int(k)*("\t")+ str(1)+'\n') # This will have duplicates chrome_pos and I don't want that, I want one chrome_pos with number ones corresponding to multiple samples. 解决方案 write val to a new array and with next, verify if already exist in that list then skip. 这篇关于删除整行的重复项的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！