问题描述
我正在使用类似于下面创建的数据 val
值的变量:
I'm working with variables resembling the data val
values created below:
# data --------------------------------------------------------------------
data("mtcars")
val <- c(mtcars$wt, 10.55)
我按以下方式剪切此变量:
I'm cutting this variable in the following manner:
# Cuts --------------------------------------------------------------------
cut_breaks <- pretty_breaks(n = 10, eps.correct = 0)(val)
res <- cut2(x = val, cuts = cut_breaks)
会产生以下结果:
> table(res)
res
[ 1, 2) [ 2, 3) [ 3, 4) [ 4, 5) [ 5, 6) 6 7 8 9 [10,11]
4 8 16 1 3 0 0 0 0 1
在创建的输出中,我想更改以下内容:
In the created output I would like to change the following:
- 我对创建具有一个值的组没有兴趣。理想情况下,我希望每个组至少具有3/4的值。矛盾的是,我可以保留具有0值的组,因为这些组以后在合并我的真实数据时会掉落。
- 对剪切机制的任何更改都必须使用处理变量整数值
- 剪切必须漂亮。我试图避免出现类似1.23-2.35的情况。即使考虑到分布,这些值将是最明智的。
- 实际上,我要实现的目标是:试图使或多或少的人变得漂亮,并且 。
- I'm not interested in creating grups with one value. Ideally, I would like to for each group to have at least 3 / 4 values. Paradoxically, I can leave with groups having 0 values as those will dropped later on when mergining on my real data
- Any changes to the cutting mechanism, have to work on a variable with integer values
- The cuts have to be pretty. I'm trying to avoid something like 1.23 - 2.35. Even if those values would be most sensible considering the distribution.
- In effect, what I'm trying to achieve is this: try to make more or less even pretty group and if getting a really tiny group then bump it together with the next group, do not worry about empty groups.
为方便起见,完整代码如下:
For convenience, the full code is available below:
# Libs --------------------------------------------------------------------
Vectorize(require)(package = c("scales", "Hmisc"),
character.only = TRUE)
# data --------------------------------------------------------------------
data("mtcars") val <- c(mtcars$wt, 10.55)
# Cuts --------------------------------------------------------------------
cut_breaks <- pretty_breaks(n = 10, eps.correct = 0)(val) res <-
cut2(x = val, cuts = cut_breaks)
我尝试过的事情
第一种方法
我尝试玩 eps.correct = 0 $
pretty_breaks
中的c $ c>值,如代码中所示:
What I've tried
First approach
I tried to play with the eps.correct = 0
value in the pretty_breaks
like in the code:
cut_breaks <- pretty_breaks(n = cuts, eps.correct = 0)(variable)
但没有的值使我到某个地方很近
but none of the values gets me anwhere were close
我也尝试过使用<$ c在 cut2
函数中使用$ c> m = 5 自变量,但我一直得到相同的结果。
I've also tried using the m= 5
argument in the cut2
function but I keep on arriving at the same result.
我尝试了 mybreaks
函数,但我必须对它进行一些工作才能获得更多精简变量的有效削减。概括地说, pretty_breaks
对我来说很不错,因为不希望偶尔出现的小团体。
I tried the mybreaks
function but I would have to put some work into it to get nice cuts for more bizzare variables. Broadly speaking, pretty_breaks
cuts well for me, juts the tiny groups that occur from time to time are not desired.
> set.seed(1); require(scales)
> mybreaks <- function(x, n, r=0) {
+ unique(round(quantile(x, seq(0, 1, length=n+1)), r))
+ }
> x <- runif(n = 100)
> pretty_breaks(n = 5)(x)
[1] 0.0 0.2 0.4 0.6 0.8 1.0
> mybreaks(x = x, n = 5)
[1] 0 1
推荐答案
您可以使用 quantile()
函数作为相对简单的方法来获取每个组中相似数量的观测值。
You could use the quantile()
function as a relatively easy way to get similar numbers of observations in each of your groups.
例如,下面的函数采用值 x
的向量,所需数量的组 n
,并为中断指定所需的舍入点 r
,并为您提供建议的切入点。
For example, here's a function that takes a vector of values x
, a desired number of groups n
, and a desired rounding off point r
for the breaks, and gives you suggested cut points.
mybreaks <- function(x, n, r=0) {
unique(round(quantile(x, seq(0, 1, length=n+1)), r))
}
cut_breaks <- mybreaks(val, 5)
res <- cut(val, cut_breaks, include.lowest=TRUE)
table(res)
[2,3] (3,4] (4,11]
8 16 5
这篇关于在带有cut2的pretty_breaks时避免空的和小的组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!