确定R中两个数据集之间的不同行 | 确定R中两个数据集之间的不同

本文介绍了确定R中两个数据集之间的不同行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个数据文件以制表符分隔的CSV格式。文件格式如下：

I have two data files in tab separated CSV format. The files are in the following format:

EP Code    EP Name    Address    Region    ...
101654    Alpha     York Street    Northwest    ...
103628    Beta    5th Avenue    South    ...

EP代码是独一无二的。我想做的是比较两个文件相对于EP代码，确定不同的行并将它们写入一个新文件。

EP codes are unique. What I want to do is to compare two files with respect to EP codes, determine the different rows and write them into a new file.

例如，file1.csv有800行，file2.csv有850行。 file2可以是一个完全包括file1加50行的文件;或者可以 file1 - 10行+ 60行。我想确定两个数据集之间的差异。我对这两行不感兴趣。

For example, file1.csv has 800 rows and file2.csv has 850 rows. file2 could be a file completely including file1 plus 50 rows; or it could be file1 - 10 rows + 60 rows. I want to determine the differences between two data sets. I'm not interested in the mutual rows.

我如何在R中做？

推荐答案

有很多方法可以做到这一点，包括 setdiff ， intersect c $ c>％in％函数， is.element 。只需找到相交集，并使用！：

There are many ways to do this, including setdiff, intersect, the %in% function, is.element. Just find the intersecting set and exclude it using !:

diff1 <- file1[setdiff(file1$ep.code, file2$ep.code),]

diff2 <- file2[!(intersect(file2$ep.code, file1$ep.code)),]

这篇关于确定R中两个数据集之间的不同行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！