我有两个数据表,如下所示:
双字
w1w2 freq w1 w2
common names 1 common names
department of 4 department of
family name 6 family name
bigrams = setDT(structure(list(w1w2 = c("common names", "department of", "family name"
), freq = c(1L, 4L, 6L), w1 = c("common", "department", "family"
), w2 = c("names", "of", "name")), .Names = c("w1w2", "freq",
"w1", "w2"), row.names = c(NA, -3L), class = "data.frame"))
美术字
w1 freq
common 2
department 3
family 4
name 5
names 1
of 9
unigrams = setDT(structure(list(w1 = c("common", "department", "family", "name",
"names", "of"), freq = c(2L, 3L, 4L, 5L, 1L, 9L)), .Names = c("w1",
"freq"), row.names = c(NA, -6L), class = "data.frame"))
所需的输出
w1w2 freq w1 w2 w1freq w2freq
common names 1 common names 2 1
department of 4 department of 3 9
family name 6 family name 4 5
我到目前为止所做的事
setkey(bigrams, w1)
setkey(unigrams, w1)
result <- bigrams[unigrams]
这给了我
i.freq
的w1
列,但是当我尝试对w2
做同样的操作时,i.freq
列被更新以反映w2
的频率。如何在单独的列中同时获取
w1
和w2
的频率? 注意:我已经看过data.table Lookup value and translate和Modify column of a data.table based on another column and add the new column的解决方案
最佳答案
您可以进行两个联接,在data.table
的v1.9.6中,可以为不同的列名称指定on=
参数。
library(data.table)
bigrams[unigrams, on=c("w1"), nomatch = 0][unigrams, on=c(w2 = "w1"), nomatch = 0]
w1w2 freq w1 w2 i.freq i.freq.1
1: family name 6 family name 4 5
2: common names 1 common names 2 1
3: department of 4 department of 3 9
关于r - 在数据表中查找数据并将其添加到新列,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/36588019/