我有两个数据框,一个用于存储,一个用于销售:
store <- data.frame(StoreID=c(1,2,3,4), StoreName=c("McDonalds", "A&W", "Burger King", "Wendy's"))
sales <- data.frame(StoreID=c(1,2,1,1,2,2), ItemID=c(2,2,3,4,4,5), SalesQty=c(10,20,30,40,50,60))
store
#StoreID StoreName
# 1 McDonalds
# 2 A&W
# 3 Burger King
# 4 Wendy's
sales
#StoreID ItemID SalesQty
# 1 2 10
# 2 2 20
# 1 3 30
# 1 4 40
# 2 4 50
# 2 5 60
我想合并它们,以便可以看到每个销售交易的StoreName:
merged <- merge(sales, store, by = "StoreID")
merged
#StoreID ItemID SalesQty StoreName
# 1 2 10 McDonalds
# 1 3 30 McDonalds
# 1 4 40 McDonalds
# 2 2 20 A&W
# 2 4 50 A&W
# 2 5 60 A&W
现在,我想知道合并数据框中的每个StoreName售出了多少个不同的商品:
tapply(merged$ItemID, merged$StoreName, FUN = function(x) length(unique(x)))
#A&W Burger King McDonalds Wendy's
# 3 NA 3 NA
我的问题是,为何抽头结果即使不在合并数据框中也显示“汉堡王”和“温迪”?
最佳答案
这是因为store$StoreName
是factor
。创建商店数据帧时,将参数stringsAsFactor
设置为FALSE
将确保在sales
期间删除那些在merge
中没有匹配元素的商店名称。
sales <- data.frame(StoreID=c(1,2,1,1,2,2), ItemID=c(2,2,3,4,4,5), SalesQty=c(10,20,30,40,50,60))
store <- data.frame(StoreID=c(1,2,3,4), StoreName=c("McDonalds", "A&W", "Burger King", "Wendy's"), stringsAsFactors = FALSE)
merged <- merge(sales, store, by = "StoreID")
tapply(merged$ItemID, merged$StoreName, FUN = function(x) length(unique(x)))
#A&W McDonalds
# 3 3
关于r - 两个数据框的内部联接仍显示所有值,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/42128502/