本文介绍了在同一个调用中汇总所有组值和条件子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我会用一个例子来说明我的问题.

I'll illustrate my question with an example.

示例数据:

 df <- data.frame(ID = c(1, 1, 2, 2, 3, 5), A = c("foo", "bar", "foo", "foo", "bar", "bar"), B =     c(1, 5, 7, 23, 54, 202))

df
  ID   A   B
1  1 foo   1
2  1 bar   5
3  2 foo   7
4  2 foo  23
5  3 bar  54
6  5 bar 202

我想要做的是,通过ID汇总B的总和和A为foo"时B的总和.我可以通过以下几个步骤来做到这一点:

What I want to do is to summarize, by ID, the sum of B and the sum of B when A is "foo". I can do this in a couple steps like:

require(magrittr)
require(dplyr)

df1 <- df %>%
  group_by(ID) %>%
  summarize(sumB = sum(B))

df2 <- df %>%
  filter(A == "foo") %>%
  group_by(ID) %>%
  summarize(sumBfoo = sum(B))

left_join(df1, df2)

  ID sumB sumBfoo
1  1    6       1
2  2   30      30
3  3   54      NA
4  5  202      NA

但是,我正在寻找一种更优雅/更快的方法,因为我正在处理 sqlite 中 10GB 以上的内存不足数据.

However, I'm looking for a more elegant/faster way, as I'm dealing with 10gb+ of out-of-memory data in sqlite.

require(sqldf)
my_db <- src_sqlite("my_db.sqlite3", create = T)
df_sqlite <- copy_to(my_db, df)

我想到用mutate来定义一个新的Bfoo列:

I thought of using mutate to define a new Bfoo column:

df_sqlite %>%
  mutate(Bfoo = ifelse(A=="foo", B, 0))

不幸的是,这不适用于数据库端.

Unfortunately, this doesn't work on the database end of things.

Error in sqliteExecStatement(conn, statement, ...) :
  RS-DBI driver: (error in statement: no such function: IFELSE)

推荐答案

写下@hadley 的评论作为答案

Writing up @hadley's comment as an answer

df_sqlite %>%
  group_by(ID) %>%
  mutate(Bfoo = if(A=="foo") B else 0) %>%
  summarize(sumB = sum(B),
            sumBfoo = sum(Bfoo)) %>%
  collect

这篇关于在同一个调用中汇总所有组值和条件子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-23 02:07