问题描述
基于几乎相同的问题,我正在尝试创建唯一的问题基于如果存在通过列的任何组合的路径",则应将行分组为相同ID的几列.区别在于我有不应该用于链接行的NA:
Based on an almost identical question, I am trying to create unique based on several columns where rows should grouped into the same ID if "there exists a path through any combination of the columns". The difference is that I have NAs that should not be used to link rows:
R的目标是基于 id1
和 id2
创建 id3
,最小示例:
The goal is for R to create id3
based on id1
and id2
, minimal example:
例如 id1 = 1
与 id2
的 a
和 b
有关.但是 id1 = 2
也与 a
相关,因此它们都属于一个组( id3 = group1
).但是由于 id1 = 2
和 id1 = 3
共享 id2 = c
,所以 id1 = 3
也属于该组( id3 = 1
).元组((1,2 ,,,''a','b','c'))
的值在其他任何地方都没有显示,因此没有其他行属于该组(标记为 group1
一般).
For example id1=1
is related to a
and b
of id2
. But id1=2
is also related to a
so both belong to one group (id3=group1
). But since id1=2
and id1=3
share id2=c
, also id1=3
belongs to that group (id3=1
). The values of the tuple ((1,2),('a','b','c'))
appear no where else, so no other row belongs to that group (which is labeled group1
generically).
library(igraph)
df = data.frame(id1 = c(1,1,2,2,3,3,4,4,5,5,6,6,NA,NA),
id2 = c('a',NA,'a','c','c','d','x',NA,'y','z','x','z',NA,NA),
id3 = c(rep('group1',6), rep('group2',6),NA,NA))
我的解决方案因 NA
个值而失败.
g <- graph_from_data_frame(df, FALSE)
cg <- clusters(g)$membership
df$id4 <- cg[df$id1]
df
操作(第2行)和第8行链接在一起,因为它们都具有 id2
的 NA
,但这应该忽略.有办法吗
Obervation (row) 2 and 8 are linked because both have NA
for id2
, but this should be ignored. Is there a way t
推荐答案
您可以尝试使用以下代码
You can try the code below using
-
组件
+成员身份
+合并
components
+memberships
+merge
g <- graph_from_data_frame(na.omit(df))
merge(
df,
transform(
rev(stack(membership(components(g))[V(g)[names(V(g)) %in% df$id1]])),
values = paste0("group", values)
),
by.x = "id1",
by.y = "ind",
all = TRUE
)
或
-
分解
+合并
subg <- decompose(graph_from_data_frame(na.omit(df)))
merge(df,
do.call(
rbind,
Map(
function(x, y) cbind(setNames(unique(as_data_frame(x)[1]), "id1"), id3 = y),
subg,
paste0("group", seq_along(subg))
)
),
by = "id1",
all = TRUE
)
这给你
id1 id2 id3
1 1 a group1
2 1 <NA> group1
3 2 a group1
4 2 c group1
5 3 c group1
6 3 d group1
7 4 x group2
8 4 <NA> group2
9 5 y group2
10 5 z group2
11 6 x group2
12 6 z group2
13 NA <NA> <NA>
14 NA <NA> <NA>
这篇关于R找到忽略NA的tupples组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!