使用R,我需要创建一个报告,其中每个部门支出最多的前2名员工,并为该部门的其他员工添加“其他”。例如,我需要与此类似的报告。

Dept.      EmployeeId     Expense
Marketing       12345         100
Marketing       12346          90
Marketing      Others         200
Sales           12347          50 <-- There's just one employee with expenses
Research        12348        2000
Research        12349         900
Research       Others       10000

换句话说,我需要汇总数据,重点是支出最多的前2名员工。费用总额栏应为公司费用总额。
employeIds <- sample(1000:9999, 20)
depts <- sample(c('Sales', 'Marketing', 'Research'), 20, replace = TRUE)
expenses <- sample(1:1000, 20, replace = TRUE)

df <- data.frame(employeIds, depts, expenses)

# Based on that data, how do I build a table with the top 2 employees with the most expenses in each department, including an "Other" employee per department.

我是R的新手,我不确定该如何处理。在SQL中,我本可以使用RANK()函数和JOIN,但是这里不是一个选项。

最佳答案

这是data.table解决方案:

创建数据:我也做出了不会出现“其他”的情况(该部门的条目数为:1

set.seed(45)
employeIds <- sample(1000:9999, 20)
depts <- sample(c('Sales', 'Marketing', 'Research'), 20, replace = TRUE)
expenses <- sample(1:1000, 20, replace = TRUE)

df <- data.frame(employeIds, depts, expenses)
df <- df[-c(6,10,12,18,19), ]

一种data.table解决方案:
require(data.table)
dt <- data.table(df, key=c("depts", "expenses"))
k <- 2
dt[, if(.N > k) {
        idx <- (seq_len(.N)-1) %/% max(k, (.N - k)) == 1
        list(EmployeeIds = c(employeIds[idx], "Others"),
           Expenses = c(expenses[idx], sum(expenses[!idx])))
     } else {
        list(EmployeeIds = as.character(employeIds), Expenses = expenses)
     }, by = depts]

#        depts EmployeeIds Expenses
# 1: Marketing        4870      567
# 2: Marketing        3167      591
# 3: Marketing      Others     2285
# 4:  Research        5989      878
# 5:  Research        9667      930
# 6:  Research      Others     1301
# 7:     Sales        6700      129
# 8:     Sales        3857      714

想法:使用dt创建key = depts, expenses的第一步是确保expenses升序排列。然后,根据每个dept的条目数,我们是否创建“其他”条目。

关于r - 汇总和排序数据框,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/16148482/

10-12 17:10
查看更多