我想按以下某种特殊方式按名称聚合数据框的两列:
通过专门汇总两列parts
和fruits
在结果中删除parts
列
苹果,香蕉和草莓的parts
值无关紧要,所有内容都已汇总,葡萄和猕猴桃的parts
值应成为新的fruits
名称
结果(在底部)应具有8个汇总行,而不是20个
乍一看,这听起来似乎很简单,但是经过数小时的反复试验,我没有找到任何有用的解决方案。例子如下:
theDF <- data.frame(dates = as.Date(c(today()+20)),
fruits = c("Apple","Apple","Apple","Apple","Banana","Banana","Banana","Banana",
"Strawberry","Strawberry","Strawberry","Strawberry","Grape", "Grape",
"Grape","Grape", "Kiwi","Kiwi","Kiwi","Kiwi"),
parts = c("Big Green Apple","Apple2","Blue Apple","XYZ Apple4",
"Yellow Banana1","Small Banana","Banana3","Banana4",
"Red Small Strawberry","Red StrawberryY","Big Strawberry",
"StrawberryZ","Green Grape", "Blue Grape", "Blue Grape",
"Blue Grape","Big Kiwi","Small Kiwi","Big Kiwi","Middle Kiwi"),
stock = as.vector(sample(1:20)) )
当前数据帧:
所需的输出:
最佳答案
我们可以使用data.table
。如果要删除某些模式,例如结尾字符为大写字母或“部件”列中的数字,则可以使用sub
进行操作,并将其与“日期”一起用作分组变量,并获得sum
股票'。
library(data.table)
setDT(theDF)[,.(stock = sum(stock)) , .(dates, fruits = sub("([0-9]|[A-Z])$", "", parts))]
# dates fruits stock
#1: 2016-06-19 Apple 46
#2: 2016-06-19 Banana 35
#3: 2016-06-19 Strawberry 38
#4: 2016-06-19 Green Grape 12
#5: 2016-06-19 Blue Grape 21
#6: 2016-06-19 Big Kiwi 37
#7: 2016-06-19 Small Kiwi 14
#8: 2016-06-19 Middle Kiwi 7
或使用
dplyr
,我们可以类似地实现相同的方法。library(dplyr)
theDF %>%
group_by(dates, fruits = sub('([0-9]|[A-Z])$', '', parts)) %>%
summarise(stock = sum(stock))
更新资料
如果没有模式,并且仅基于手动识别“水果”中的元素,则创建元素的
vector
,使用%chin%
获取“ i”中的逻辑索引,分配(:=
)中的值零件”对应于“ i”至“水果”,然后按“日期”,“水果”进行分组并获得“库存”的sum
。setDT(theDF)[as.character(fruits) %chin% c("Grape", "Kiwi"),
fruits := parts][, .(stock = sum(stock)), .(dates, fruits)]
数据
theDF <- structure(list(dates = structure(c(16971, 16971, 16971, 16971,
16971, 16971, 16971, 16971, 16971, 16971, 16971, 16971, 16971,
16971, 16971, 16971, 16971, 16971, 16971, 16971), class = "Date"),
fruits = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 5L,
5L, 5L, 5L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("Apple",
"Banana", "Grape", "Kiwi", "Strawberry"), class = "factor"),
parts = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 14L,
15L, 16L, 16L, 11L, 10L, 10L, 10L, 9L, 13L, 9L, 12L), .Label = c("Apple1",
"Apple2", "Apple3", "Apple4", "Banana1", "Banana2", "Banana3",
"Banana4", "Big Kiwi", "Blue Grape", "Green Grape", "Middle Kiwi",
"Small Kiwi", "StrawberryX", "StrawberryY", "StrawberryZ"
), class = "factor"), stock = c(8, 19, 15, 4, 6, 18, 1, 10,
9, 16, 11, 2, 12, 13, 5, 3, 17, 14, 20, 7)), .Names = c("dates",
"fruits", "parts", "stock"), row.names = c(NA, -20L), class = "data.frame")