r - 比较两个data.frame并删除具有公共(public)字符的行

我有两个data.frame x1＆x2。如果要在x1和x2中找到一个共同的基因，我想从x2中删除行

x1 <- chr   start   end         Genes
      1      8401    8410      Mndal,Mnda,Ifi203,Ifi202b
      2      8001    8020      Cyb5r1,Adipor1,Klhl12
      3      4001    4020      Alyref2,Itln1,Cd244

x2 <- chr   start   end         Genes
      1      8861   8868       Olfr1193
      1      8405    8420      Mrgprx3-ps,Mrgpra1,Mrgpra2a,Mndal,Mrgpra2b
      2      8501    8520      Chia,Chi3l3,Chi3l4
      3      4321    4670      Tdpoz4,Tdpoz3,Tdpoz5



x2 <- chr   start   end         Genes
      1      8861   8868       Olfr1193
      2      8501    8520      Chia,Chi3l3,Chi3l4
      3      4321    4670      Tdpoz4,Tdpoz3,Tdpoz5

最佳答案

你可以试试

x2[mapply(function(x,y) !any(x %in% y),
        strsplit(x1$Genes, ','), strsplit(x2$Genes, ',')),]
#  chr start  end                Genes
#2   2  8501 8520   Chia,Chi3l3,Chi3l4
#3   3  4321 4670 Tdpoz4,Tdpoz3,Tdpoz5

或将!any(x %in% y)替换为length(intersect(x,y))==0。
注意:如果“Genes”列为“factor”，则将其转换为“character”，因为strsplit不能使用“factor”类。即strsplit(as.character(x1$Genes, ','))更新资料
基于'x2'的新数据集，我们可以通过'chr'列对这两个数据集进行merge，对输出数据集('xNew')的'Genes.x'，'Genes.y'进行strsplit，获得逻辑索引根据'Genes.y'字符串中'Genes.x'的任何元素的出现，使用该元素对'x2'数据集进行子集化

 xNew <- merge(x1, x2[,c(1,4)], by='chr')
 indx <- mapply(function(x,y) any(x %in% y),
      strsplit(xNew$Genes.x, ','), strsplit(xNew$Genes.y, ','))
 x2[!indx,]
 # chr start  end                Genes
 #1   1  8861 8868             Olfr1193
 #3   2  8501 8520   Chia,Chi3l3,Chi3l4
 #4   3  4321 4670 Tdpoz4,Tdpoz3,Tdpoz5

genes

r - 比较两个data.frame并删除具有公共(public)字符的行