R找到忽略NA的tupples组 | R找到忽略NA的tupples组

本文介绍了R找到忽略NA的tupples组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

基于几乎相同的问题，我正在尝试创建唯一的问题基于如果存在通过列的任何组合的路径"，则应将行分组为相同ID的几列.区别在于我有不应该用于链接行的NA:

Based on an almost identical question, I am trying to create unique based on several columns where rows should grouped into the same ID if "there exists a path through any combination of the columns". The difference is that I have NAs that should not be used to link rows:

R的目标是基于 id1 和 id2 创建 id3 ，最小示例:

The goal is for R to create id3 based on id1 and id2, minimal example:

例如 id1 = 1 与 id2 的 a 和 b 有关.但是 id1 = 2 也与 a 相关，因此它们都属于一个组( id3 = group1 ).但是由于 id1 = 2 和 id1 = 3 共享 id2 = c ，所以 id1 = 3 也属于该组( id3 = 1 ).元组((1,2 ,,,''a'，'b'，'c'))的值在其他任何地方都没有显示，因此没有其他行属于该组(标记为 group1 一般).

For example id1=1 is related to a and b of id2. But id1=2 is also related to a so both belong to one group (id3=group1). But since id1=2 and id1=3 share id2=c, also id1=3 belongs to that group (id3=1). The values of the tuple ((1,2),('a','b','c')) appear no where else, so no other row belongs to that group (which is labeled group1 generically).

library(igraph)
df = data.frame(id1 = c(1,1,2,2,3,3,4,4,5,5,6,6,NA,NA),
                id2 = c('a',NA,'a','c','c','d','x',NA,'y','z','x','z',NA,NA),
                id3 = c(rep('group1',6), rep('group2',6),NA,NA))

我的解决方案因 NA 个值而失败.

g <- graph_from_data_frame(df, FALSE)
cg <- clusters(g)$membership
df$id4 <- cg[df$id1]
df

操作(第2行)和第8行链接在一起，因为它们都具有 id2 的 NA ，但这应该忽略.有办法吗

Obervation (row) 2 and 8 are linked because both have NA for id2, but this should be ignored. Is there a way t

推荐答案

您可以尝试使用以下代码

You can try the code below using

组件 + 成员身份 + 合并

components + memberships + merge

g <- graph_from_data_frame(na.omit(df))
merge(
  df,
  transform(
    rev(stack(membership(components(g))[V(g)[names(V(g)) %in% df$id1]])),
    values = paste0("group", values)
  ),
  by.x = "id1",
  by.y = "ind",
  all = TRUE
)

或

分解 + 合并

subg <- decompose(graph_from_data_frame(na.omit(df)))
merge(df,
  do.call(
    rbind,
    Map(
      function(x, y) cbind(setNames(unique(as_data_frame(x)[1]), "id1"), id3 = y),
      subg,
      paste0("group", seq_along(subg))
    )
  ),
  by = "id1",
  all = TRUE
)

这给你

   id1  id2    id3
1    1    a group1
2    1 <NA> group1
3    2    a group1
4    2    c group1
5    3    c group1
6    3    d group1
7    4    x group2
8    4 <NA> group2
9    5    y group2
10   5    z group2
11   6    x group2
12   6    z group2
13  NA <NA>   <NA>
14  NA <NA>   <NA>

这篇关于R找到忽略NA的tupples组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！