问题描述
有没有办法像在dplyr表中的数值分档一样的cut()函数呢?我正在处理一个大的postgres表,目前可以在一开始就在sql中写一个case语句,或者输出unregregated数据并应用cut()。两者都有非常明显的缺点:case语句不是特别优雅,并且通过collect()来拉大量的记录并不是有效的。
Is there a way to do something like a cut() function for binning numeric values in a dplyr table? I'm working on a large postgres table and can currently either write a case statement in the sql at the outset, or output unaggregated data and apply cut(). Both have pretty obvious downsides... case statements are not particularly elegant and pulling a large number of records via collect() not at all efficient.
推荐答案
只要通过搜索引擎到达这里的其他人立即得到答案,那么 cut
的n-break形式现在被实现为 ntile
函数 dplyr
:
Just so there's an immediate answer for others arriving here via search engine, the n-breaks form of cut
is now implemented as the ntile
function in dplyr
:
> data.frame(x = c(5, 1, 3, 2, 2, 3)) %>% mutate(bin = ntile(x, 2))
x bin
1 5 2
2 1 1
3 3 2
4 2 1
5 2 1
6 3 2
这篇关于在dplyr中可以使用cut()样式binning吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!