我有两个数据框,一个用于存储,一个用于销售:

store <- data.frame(StoreID=c(1,2,3,4), StoreName=c("McDonalds", "A&W", "Burger King", "Wendy's"))
sales <- data.frame(StoreID=c(1,2,1,1,2,2), ItemID=c(2,2,3,4,4,5), SalesQty=c(10,20,30,40,50,60))

store
#StoreID   StoreName
#      1   McDonalds
#      2         A&W
#      3 Burger King
#      4     Wendy's

sales
#StoreID ItemID SalesQty
#      1      2       10
#      2      2       20
#      1      3       30
#      1      4       40
#      2      4       50
#      2      5       60

我想合并它们,以便可以看到每个销售交易的StoreName:
merged <- merge(sales, store, by = "StoreID")

merged
#StoreID ItemID SalesQty StoreName
#      1      2       10 McDonalds
#      1      3       30 McDonalds
#      1      4       40 McDonalds
#      2      2       20       A&W
#      2      4       50       A&W
#      2      5       60       A&W

现在,我想知道合并数据框中的每个StoreName售出了多少个不同的商品:
tapply(merged$ItemID, merged$StoreName, FUN = function(x) length(unique(x)))

#A&W Burger King   McDonalds     Wendy's
#  3          NA           3          NA

我的问题是,为何抽头结果即使不在合并数据框中也显示“汉堡王”和“温迪”?

最佳答案

这是因为store$StoreNamefactor。创建商店数据帧时,将参数stringsAsFactor设置为FALSE将确保在sales期间删除那些在merge中没有匹配元素的商店名称。

sales <- data.frame(StoreID=c(1,2,1,1,2,2), ItemID=c(2,2,3,4,4,5), SalesQty=c(10,20,30,40,50,60))
store <- data.frame(StoreID=c(1,2,3,4), StoreName=c("McDonalds", "A&W", "Burger King", "Wendy's"), stringsAsFactors = FALSE)
merged <- merge(sales, store, by = "StoreID")
tapply(merged$ItemID, merged$StoreName, FUN = function(x) length(unique(x)))

  #A&W McDonalds
  #  3         3

关于r - 两个数据框的内部联接仍显示所有值,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/42128502/

10-11 03:15
查看更多