本文介绍了如果两个不同的分组与dplyr不相交,如何合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我在一个数据帧中有两组标识符 id1 id2 .如何创建新的标识符 id3 ,其工作方式如下:

Suppose that I have two sets of identifiers id1 and id2 in a data frame. How can I create a new identifier id3 that works as follows:

我认为 id1 是更严格的键,因此,观测值首先会在 id1 中分组,然后在 id2 中分组.如果存在两组具有不同 id2 值的行,且其中某些元素具有相同的 id1 ,则这两组行的 id3值应相同( id3 中的确切值无关紧要).

I consider id1 as the stricter key, so that observations are first grouped in id1 and then in id2. If there are two sets of rows with different values of id2 that have some of its elements with the same id1, these two sets should have the same value for id3 (the exact value in id3 doesn't matter much).

 df <- data.frame(id1 = c(1, 1, 2, 2, 5, 6),
             id2 = c(4, 3, 1, 2, 2, 7),
             id3 = c(1, 1, 2, 2, 2, 3))

第1行和第2行分组在一起,因为它们具有相同的 id1 .第3行,第4行和第5行分组在一起,因为第3行和第4行具有相同的 id1 ,第4行和第5行具有相同的 id2 .

Rows 1 and 2 are grouped together because they have the same id1. Rows 3, 4 and 5 are grouped together because 3 and 4 have the same id1 and 4 and 5 have the same id2.

有人可以帮忙吗?我宁愿使用 dplyr 的解决方案来解决一般情况,其中 id 列中存在任意数量的可能值.

Can someone help? I would rather have a solution with dplyr that encompasses a general case in which there is an arbitrary number of possible values in the id columns.

推荐答案

这是一个图论问题.每个 id1 id2 是一个单独的节点,并且 df 给出了它们之间的链接.您正在查看每个id也属于哪些弱连接的群集.

This is a graph theory problem. Each id1 and id2 is a separate node and df gives the links between them. You are looking to see which weakly connected clusters each id belongs too.

library(igraph)
df <- df %>% mutate(from = paste0('id1', '_', id1), to = paste0('id2', '_', id2))
dg <- graph_from_data_frame(df %>% select(from, to), directed = FALSE)
df <- df %>% mutate(id3 = components(dg)$membership[from])
df %>% select(id1, id2, id3)

#>   id1 id2 id3
#> 1   1   4   1
#> 2   1   3   1
#> 3   2   1   2
#> 4   2   2   2
#> 5   5   2   2
#> 6   6   7   3

这篇关于如果两个不同的分组与dplyr不相交,如何合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-14 14:30