

按照,我现在正在写我的2nd R函数并使用类似的逻辑。但是,我正在尝试实现更多自动化,对于我自己来说可能变得太聪明了。

Following some great advice from before, I'm now writing my 2nd R function and using a similar logic. However, I'm trying to automate a bit more and may be getting too smart for my own good.


I want to break the clients into quintiles based on the number of orders. Here's my code to do so:

# sample data
clientID <- round(runif(200,min=2000, max=3000),0)
orders <- round(runif(200,min=1, max=50),0)

df <- df <- data.frame(cbind(clientID,orders))

#function to break them into quintiles
ApplyQuintiles <- function(x) {
  cut(x, breaks=c(quantile(df$orders, probs = seq(0, 1, by = 0.20))),

#Add the quintile to the dataframe
df$Quintile <- sapply(df$orders, ApplyQuintiles)

表(df $ Quintile)

0-20   20-40   40-60    60-80   80-100
40     39      44       38      36

您会在这里看到,在我的样本数据中,我创建了200个观测值,但通过<$ c $仅列出了197个观测值c>表格。剩下的3个是 NA

You'll see here that in my sample data, I created 200 observations, yet only 197 are listed via table. The 3 left off are NA

现在,有些clientID的五分位数为 NA。看来如果它们处于最低中断位置(在这种情况下为1),则它们不包含在cut函数中。

Now, there are some clientIDs that have an 'NA' for quintile. It seems if they were at the lowest break, in this case, 1, then they were not included in the cut function.

是否有一种方法可以使 cut 包含所有观察值?

Is there a way to make cut inclusive of all observations?




clientID <- round(runif(200,min=2000, max=3000),0)
orders <- round(runif(200,min=1, max=50),0)

df <- df <- data.frame(cbind(clientID,orders))

ApplyQuintiles <- function(x) {
  cut(x, breaks=c(quantile(df$orders, probs = seq(0, 1, by = 0.20))),
      labels=c("0-20","20-40","40-60","60-80","80-100"), include.lowest=TRUE)
df$Quintile <- sapply(df$orders, ApplyQuintiles)

0-20  20-40  40-60  60-80 80-100
  40     41     39     40     40

我在其中包括 include.lowest = TRUE 您的剪切功能,似乎使其起作用。有关更多详细信息,请参见?cut

I included include.lowest=TRUE in your cut function, which seems to make it work. See ?cut for more details.


08-11 15:48