比较R数据帧和A的多列中的值更新缺失值 | 比较R数据帧和A的多列中的值更新缺失值

本文介绍了比较R数据帧和A的多列中的值更新缺失值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有3个数据框。
first df包含一个列-名称-

I have 3 dataframes .first df contains one column - Name -

df 1
    Name
    A
    B
    C
    D
    E
    F
    G
    H
    I
    J
    K

第二个df包含两列-名称和计数，但第一个df可能会或可能不会缺少某些名称

Second df contains two columns - Name and counts but some of the Names may or may not be missing from first df.

df 2 -
  Name   Counts
    A    12
    B    23
    C    34
    D    56
    E    34
    K    44

I要比较从第二个df到第一个df的所有名称，如果没有任何一个名称丢失，则可以。
如果缺少任何名称，则必须从第三df填充该名称及其计数。第三个df将始终具有可用的名称和计数。

I want compare all Names from second df to first df , If none of the names are missing , then fine.If any name is missing then that name and its count has to be filled from third df . The third df will always have names and counts available in it.

df 3 -
 Name   Counts
    A    34
    B    45
    C    34
    D    56
    E    67
    F    435
    G    45
    H    76
    I    76
    J    88
    K    90

因此在上面的示例中，由于F，G， H，I，J在第二个df中丢失，应从df 3中添加其信息。

So in above example Since F, G, H , I, J are missing in second df , their info should be added from df 3 .

和
第二个df应该更新为-

andsecond df should be updated as -

Name   Counts
    A    12
    B    23
    C    34
    D    56
    E    34
    F    435
    G    45
    H    76
    I    76
    J    88
    K    44

任何帮助

谢谢

推荐答案

可以。

library(data.table)
setDT(DF1); setDT(DF2); setDT(DF3)

DF1[, n := unique(rbind(DF2, DF3), by="Name")[.(.SD$Name), on=.(Name), x.Counts]]

这会向DF1添加一列：

which adds a column to DF1:

    Name   n
 1:    A  12
 2:    B  23
 3:    C  34
 4:    D  56
 5:    E  34
 6:    F 435
 7:    G  45
 8:    H  76
 9:    I  76
10:    J  88
11:    K  44

您可以改为 merge（DF1，unique（rbind （DF2，DF3），by =名称），all.x = TRUE），尽管那样会创建一个新表，而不是在现有表中添加列。此合并的dplyr类似物是 left_join（DF1，bind_rows（DF2，DF3）％>％distinct（Name））。

You could instead do merge(DF1, unique(rbind(DF2, DF3), by="Name"), all.x=TRUE), though that would create a new table instead of adding a column to an existing table. The dplyr analogue of this merge is left_join(DF1, bind_rows(DF2, DF3) %>% distinct(Name)).

工作原理

DF = rbind（DF2，DF3 ）追加两个源表

uDF = unique（DF，by = Name）每个名称

DF1 [，n：= z] 将值 z 的列 n 添加到 DF1

z = x [i，on =，xv] 使用 i 查找 x 的上一行，然后返回列 v ，其中...
- x = uDF
- v =计数
- i = .SD $ Name 是在 DF1
- DF = rbind(DF2, DF3) appends the two source tables
- uDF = unique(DF, by="Name") keeps the first row for each Name
- DF1[, n := z] adds column n with values z to DF1
- z = x[i, on=, x.v] uses i to look up rows of x then returns column v, where...x = uDF v = Counts i = .SD$Name is the vector of names found in DF1 .SD 在 DT的 j 中是指 DT 本身就是数据子集。 .SD in j of DT[i, j] refers to DT itself, the "Subset of Data". 这篇关于比较R数据帧和A的多列中的值更新缺失值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！