我有两个数据表,如下所示:
双字

 w1w2           freq   w1          w2
 common names   1      common      names
 department of  4      department  of
 family name    6      family      name

bigrams = setDT(structure(list(w1w2 = c("common names", "department of", "family name"
), freq = c(1L, 4L, 6L), w1 = c("common", "department", "family"
), w2 = c("names", "of", "name")), .Names = c("w1w2", "freq",
"w1", "w2"), row.names = c(NA, -3L), class = "data.frame"))

美术字
w1            freq
common        2
department    3
family        4
name          5
names         1
of            9

unigrams = setDT(structure(list(w1 = c("common", "department", "family", "name",
"names", "of"), freq = c(2L, 3L, 4L, 5L, 1L, 9L)), .Names = c("w1",
"freq"), row.names = c(NA, -6L), class = "data.frame"))

所需的输出
 w1w2           freq   w1          w2      w1freq    w2freq
 common names   1      common      names   2         1
 department of  4      department  of      3         9
 family name    6      family      name    4         5

我到目前为止所做的事
setkey(bigrams, w1)
setkey(unigrams, w1)
result <- bigrams[unigrams]

这给了我i.freqw1列,但是当我尝试对w2做同样的操作时,i.freq列被更新以反映w2的频率。

如何在单独的列中同时获取w1w2的频率?

注意:我已经看过data.table Lookup value and translateModify column of a data.table based on another column and add the new column的解决方案

最佳答案

您可以进行两个联接,在data.table的v1.9.6中,可以为不同的列名称指定on=参数。

library(data.table)

bigrams[unigrams, on=c("w1"), nomatch = 0][unigrams, on=c(w2 = "w1"), nomatch = 0]

            w1w2 freq         w1    w2 i.freq i.freq.1
1:   family name    6     family  name      4        5
2:  common names    1     common names      2        1
3: department of    4 department    of      3        9

关于r - 在数据表中查找数据并将其添加到新列,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/36588019/

10-12 17:21
查看更多