在dplyr中可以使用cut（）样式binning吗？

本文介绍了在dplyr中可以使用cut（）样式binning吗？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有没有办法像在dplyr表中的数值分档一样的cut（）函数呢？我正在处理一个大的postgres表，目前可以在一开始就在sql中写一个case语句，或者输出unregregated数据并应用cut（）。两者都有非常明显的缺点：case语句不是特别优雅，并且通过collect（）来拉大量的记录并不是有效的。

Is there a way to do something like a cut() function for binning numeric values in a dplyr table? I'm working on a large postgres table and can currently either write a case statement in the sql at the outset, or output unaggregated data and apply cut(). Both have pretty obvious downsides... case statements are not particularly elegant and pulling a large number of records via collect() not at all efficient.

推荐答案

只要通过搜索引擎到达这里的其他人立即得到答案，那么 cut 的n-break形式现在被实现为 ntile 函数 dplyr ：

Just so there's an immediate answer for others arriving here via search engine, the n-breaks form of cut is now implemented as the ntile function in dplyr:

> data.frame(x = c(5, 1, 3, 2, 2, 3)) %>% mutate(bin = ntile(x, 2))
  x bin
1 5   2
2 1   1
3 3   2
4 2   1
5 2   1
6 3   2

这篇关于在dplyr中可以使用cut（）样式binning吗？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！