本文介绍了总结dplyr中的数学条件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基于此问题:
我想要使用 dplyr 根据数学条件来汇总一列(而不是链接文章中的字符串匹配)。当测量 / 时间测量 $ c>最高,同时创建新列 ratio 。我还想遍历整行,但不确定如何使用 dplyr 摘要

Building on this question: Summarize with conditions in dplyrI would like to use dplyr to summarize a column based on a mathematical condition (not string matching as in the linked post). I need to find the maximum measurement when the ratio of measurement/time is the highest, while creating a new column ratio. I'd also like to carry through the entire row, which I'm unsure how to do with dplyr's summarize function.

示例数据框

print(df)

   sample     type time measurement
1       a bacteria   24     0.57561
2       a bacteria   44     1.67236
3       a bacteria   67     4.17100
4       a bacteria   88    11.51661
5       b bacteria   24     0.53269
6       b bacteria   44     1.24942
7       b bacteria   67     5.72147
8       b bacteria   88    11.04017
9       c bacteria    0     0.00000
10      c bacteria   24     0.47418
11      c bacteria   39     1.06286
12      c bacteria   64     3.59649
13      c bacteria   78     7.05190
14      c bacteria  108     7.27060






所需产量

  sample     type time measurement      ratio
1      a bacteria   88    11.51661 0.13087057
2      b bacteria   88    11.04017 0.12545648
3      c bacteria   78     7.05190 0.09040897






失败尝试

这仅返回由 group_by summaryize 函数定义的两列,希望使整个行信息得以通过:

This only returns the two columns as defined by the group_by and summarize function, would like to have the entire row information carry through:

library(dplyr)
df %>%
    group_by(sample) %>%
    summarize(ratio = max(measurement/time, na.rm = TRUE))

  sample  ratio
  <fct>   <dbl>
1 a      0.131
2 b      0.125
3 c      0.0904






可复制数据

structure(list(sample = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"),
    type = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L), .Label = "bacteria", class = "factor"),
    time = c(24, 44, 67, 88, 24, 44, 67, 88, 0, 24, 39, 64, 78,
    108), measurement = c(0.57561, 1.67236, 4.171, 11.51661,
    0.53269, 1.24942, 5.72147, 11.04017, 0, 0.47418, 1.06286,
    3.59649, 7.0519, 7.2706)), class = "data.frame", row.names = c(NA,
-14L))


推荐答案

df %>%
  mutate(ratio = measurement/time) %>%
  group_by(sample) %>%
  filter(ratio == max(ratio, na.rm=TRUE))

这篇关于总结dplyr中的数学条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-25 02:50