我有一个客户贷款数据库,我想按LoanRefID进行ddply摘要:

    LoanRefId               Tran_Type TransactionAmount
103        11               LoanIssue         1000.0000
104        11           InitiationFee          171.0000
105        11                Interest           59.6729
106        11       AdministrationFee           64.9332
107        11 RaisedClientInstallment         1295.5757
108        11       ClientInstallment         1295.4700
109        11                  PaidUp            0.0000
110        11              Adjustment            0.1361
111        11                  PaidUp            0.0000
112        12               LoanIssue         3000.0000
113        12           InitiationFee          399.0000
114        12                Interest           94.9858
115        12       AdministrationFee           38.6975
116        12 RaisedClientInstallment         3532.6350
117        12       ClientInstallment         3532.6100
118        12                  PaidUp            0.0000
119        12              Adjustment            0.0733
120        12                  PaidUp            0.0000

但是,我只想对每个loanID只汇总某些行。具体来说,我只想总结Tran_Type ==“ClientInstallment”的位置。

我能想到的唯一方法(似乎不起作用)是:
> ddply(test, c("LoanRefId"), summarise, cash_in = sum(test[test$Tran_Type == "ClientInstallment","TransactionAmount"]))

  LoanRefId cash_in
1        11 4828.08
2        12 4828.08

这不是每个LoanRefId的总和,仅是在Tran_Type ==“CLientInstallment”的所有金额之和,这是错误的。

有没有更好的方法来进行此逻辑求和?

最佳答案

有人可以添加plyr答案,但如今base Rdplyrdata.table更加广泛地使用。 plyr已更新和升级。值得花时间学习较新的实现,因为它们效率更高且具有很多功能。

基础R

aggregate(TransactionAmount ~ LoanRefId, df[df$Tran_Type == "ClientInstallment",], sum)
#  LoanRefId TransactionAmount
#1        11           1295.47
#2        12           3532.61

dplyr
library(dplyr)
df %>%
  group_by(LoanRefId) %>%
  filter(Tran_Type == "ClientInstallment") %>%
  summarise(TransactionAmount = sum(TransactionAmount))
#Source: local data frame [2 x 2]
#
#  LoanRefId TransactionAmount
#      (int)             (dbl)
#1        11           1295.47
#2        12           3532.61

数据表
setDT(df)[Tran_Type == "ClientInstallment", sum(TransactionAmount), by=LoanRefId]
#   LoanRefId      V1
#1:        11 1295.47
#2:        12 3532.61

注意data.table语法是多么干净:)。学习的好工具。

09-03 18:26