问题描述
按照,我现在正在写我的2nd R函数并使用类似的逻辑。但是,我正在尝试实现更多自动化,对于我自己来说可能变得太聪明了。
Following some great advice from before, I'm now writing my 2nd R function and using a similar logic. However, I'm trying to automate a bit more and may be getting too smart for my own good.
我想根据订单数量将客户分成五等分。这是我的代码:
I want to break the clients into quintiles based on the number of orders. Here's my code to do so:
# sample data
clientID <- round(runif(200,min=2000, max=3000),0)
orders <- round(runif(200,min=1, max=50),0)
df <- df <- data.frame(cbind(clientID,orders))
#function to break them into quintiles
ApplyQuintiles <- function(x) {
cut(x, breaks=c(quantile(df$orders, probs = seq(0, 1, by = 0.20))),
labels=c("0-20","20-40","40-60","60-80","80-100"))
}
#Add the quintile to the dataframe
df$Quintile <- sapply(df$orders, ApplyQuintiles)
表(df $ Quintile)
0-20 20-40 40-60 60-80 80-100
40 39 44 38 36
您会在这里看到,在我的样本数据中,我创建了200个观测值,但通过<$ c $仅列出了197个观测值c>表格。剩下的3个是 NA
You'll see here that in my sample data, I created 200 observations, yet only 197 are listed via table
. The 3 left off are NA
现在,有些clientID的五分位数为 NA。看来如果它们处于最低中断位置(在这种情况下为1),则它们不包含在cut函数中。
Now, there are some clientIDs that have an 'NA' for quintile. It seems if they were at the lowest break, in this case, 1, then they were not included in the cut function.
是否有一种方法可以使 cut
包含所有观察值?
Is there a way to make cut
inclusive of all observations?
推荐答案
尝试以下操作:
set.seed(700)
clientID <- round(runif(200,min=2000, max=3000),0)
orders <- round(runif(200,min=1, max=50),0)
df <- df <- data.frame(cbind(clientID,orders))
ApplyQuintiles <- function(x) {
cut(x, breaks=c(quantile(df$orders, probs = seq(0, 1, by = 0.20))),
labels=c("0-20","20-40","40-60","60-80","80-100"), include.lowest=TRUE)
}
df$Quintile <- sapply(df$orders, ApplyQuintiles)
table(df$Quintile)
0-20 20-40 40-60 60-80 80-100
40 41 39 40 40
我在其中包括 include.lowest = TRUE
您的剪切功能,似乎使其起作用。有关更多详细信息,请参见?cut
。
I included include.lowest=TRUE
in your cut function, which seems to make it work. See ?cut
for more details.
这篇关于使用CUT和四分位数在R函数中生成中断的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!