问题描述
我有两个数据文件以制表符分隔的CSV格式。文件格式如下:
I have two data files in tab separated CSV format. The files are in the following format:
EP Code EP Name Address Region ... 101654 Alpha York Street Northwest ... 103628 Beta 5th Avenue South ...
EP代码是独一无二的。我想做的是比较两个文件相对于EP代码,确定不同的行并将它们写入一个新文件。
EP codes are unique. What I want to do is to compare two files with respect to EP codes, determine the different rows and write them into a new file.
例如,file1.csv有800行,file2.csv有850行。 file2可以是一个完全包括file1加50行的文件;或者可以 file1 - 10行+ 60行。我想确定两个数据集之间的差异。我对这两行不感兴趣。
For example, file1.csv has 800 rows and file2.csv has 850 rows. file2 could be a file completely including file1 plus 50 rows; or it could be file1 - 10 rows + 60 rows. I want to determine the differences between two data sets. I'm not interested in the mutual rows.
我如何在R中做?
推荐答案
有很多方法可以做到这一点,包括 setdiff , intersect c $ c>%in%函数, is.element 。只需找到相交集,并使用!:
There are many ways to do this, including setdiff, intersect, the %in% function, is.element. Just find the intersecting set and exclude it using !:
diff1 <- file1[setdiff(file1$ep.code, file2$ep.code),]
diff2 <- file2[!(intersect(file2$ep.code, file1$ep.code)),]
这篇关于确定R中两个数据集之间的不同行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!