问题描述
我有3个数据框。
first df包含一个列-名称-
I have 3 dataframes .first df contains one column - Name -
df 1
Name
A
B
C
D
E
F
G
H
I
J
K
第二个df包含两列-名称和计数,但第一个df可能会或可能不会缺少某些名称
Second df contains two columns - Name and counts but some of the Names may or may not be missing from first df.
df 2 -
Name Counts
A 12
B 23
C 34
D 56
E 34
K 44
I要比较从第二个df到第一个df的所有名称,如果没有任何一个名称丢失,则可以。
如果缺少任何名称,则必须从第三df填充该名称及其计数。第三个df将始终具有可用的名称和计数。
I want compare all Names from second df to first df , If none of the names are missing , then fine.If any name is missing then that name and its count has to be filled from third df . The third df will always have names and counts available in it.
df 3 -
Name Counts
A 34
B 45
C 34
D 56
E 67
F 435
G 45
H 76
I 76
J 88
K 90
因此在上面的示例中,由于F,G, H,I,J在第二个df中丢失,应从df 3中添加其信息。
So in above example Since F, G, H , I, J are missing in second df , their info should be added from df 3 .
和
第二个df应该更新为-
andsecond df should be updated as -
Name Counts
A 12
B 23
C 34
D 56
E 34
F 435
G 45
H 76
I 76
J 88
K 44
任何帮助
谢谢
推荐答案
可以。
library(data.table)
setDT(DF1); setDT(DF2); setDT(DF3)
DF1[, n := unique(rbind(DF2, DF3), by="Name")[.(.SD$Name), on=.(Name), x.Counts]]
这会向DF1添加一列:
which adds a column to DF1:
Name n
1: A 12
2: B 23
3: C 34
4: D 56
5: E 34
6: F 435
7: G 45
8: H 76
9: I 76
10: J 88
11: K 44
您可以改为 merge(DF1,unique(rbind (DF2,DF3),by =名称),all.x = TRUE)
,尽管那样会创建一个新表,而不是在现有表中添加列。此合并的dplyr类似物是 left_join(DF1,bind_rows(DF2,DF3)%>%distinct(Name))
。
You could instead do merge(DF1, unique(rbind(DF2, DF3), by="Name"), all.x=TRUE)
, though that would create a new table instead of adding a column to an existing table. The dplyr analogue of this merge is left_join(DF1, bind_rows(DF2, DF3) %>% distinct(Name))
.
工作原理
-
DF = rbind(DF2,DF3 )
追加两个源表 -
uDF = unique(DF,by = Name)
每个名称
的第一行 -
DF1 [,n:= z]
将值z
的列n
添加到DF1
-
z = x [i,on =,xv]
使用i
查找x
的上一行,然后返回列v
,其中...
-
x = uDF
-
v =计数
-
i = .SD $ Name
是在DF1
DF = rbind(DF2, DF3)
appends the two source tablesuDF = unique(DF, by="Name")
keeps the first row for eachName
DF1[, n := z]
adds columnn
with valuesz
toDF1
z = x[i, on=, x.v]
usesi
to look up rows ofx
then returns columnv
, where...x = uDF
v = Counts
i = .SD$Name
is the vector of names found inDF1
.SD
在DT的
j
中是指DT
本身就是数据子集。.SD
inj
ofDT[i, j]
refers toDT
itself, the "Subset of Data".这篇关于比较R数据帧和A的多列中的值更新缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
-