


This is a small example. In my larger dataset, I have multiple years of data and the number of observations per group (div) are not always equal.


  year = 2014,
  id = sample(LETTERS[1:26], 12),
  div = rep(c("1", "2a", "2b"), each=4),
  pts = c(9,7,9,3,7,5,3,7,2,7,7,1),
  x = c(10,12,11,7,7,5,4,12,4,6,7,2)


#   year id div pts  x
#1  2014  G   1   9 10
#2  2014  J   1   7 12
#3  2014  N   1   9 11
#4  2014  U   1   3  7
#5  2014  E  2a   7  7
#6  2014  S  2a   5  5
#7  2014  W  2a   3  4
#8  2014  M  2a   7 12
#9  2014  L  2b   2  4
#10 2014  B  2b   7  6
#11 2014  D  2b   7  7
#12 2014  C  2b   1  2

I want to rank this data such that individuals in div 1 are ranked higher than div 2a/2b, and within div 1 individuals are ranked 1,2,3,4 based on highest number of 'pts' followed by highest number of 'x'.

Individuals in div 2a and div 2b should be ranked individually also based on the same criteria. This would look like this:

df %>% 
  group_by(div) %>%
  arrange(desc(pts), desc(x)) %>%
  mutate(position = row_number(div))

#   year id div pts  x position
#1  2014  N   1   9 11        1
#2  2014  G   1   9 10        2
#3  2014  J   1   7 12        3
#4  2014  U   1   3  7        4
#5  2014  M  2a   7 12        1
#6  2014  E  2a   7  7        2
#7  2014  S  2a   5  5        3
#8  2014  W  2a   3  4        4
#9  2014  D  2b   7  7        1
#10 2014  B  2b   7  6        2
#11 2014  L  2b   2  4        3
#12 2014  C  2b   1  2        4

但是,我想生成另一个列的最终列/变量。这将使div 1中的所有个人的排名高于2a / 2b,但2a / 2b相等。即2a / 2b中为1的个人现在应该获得5.5,排名2的个人现在应该获得7.5。

However, I want to produce a final column/variable that is another rank. This would rank all individuals in div 1 as higher than 2a/2b, but 2a/2b are equal. i.e. individuals who are 1 in 2a/2b should now get 5.5, individuals who are ranked 2 should now get 7.5. There are always an equal number of individuals in div2a and div2b for all years.


#   year id div pts  x position final
#1  2014  N   1   9 11        1   1.0  
#2  2014  G   1   9 10        2   2.0
#3  2014  J   1   7 12        3   3.0
#4  2014  U   1   3  7        4   4.0
#5  2014  M  2a   7 12        1   5.5
#6  2014  E  2a   7  7        2   7.5
#7  2014  S  2a   5  5        3   9.5
#8  2014  W  2a   3  4        4  11.5
#9  2014  D  2b   7  7        1   5.5
#10 2014  B  2b   7  6        2   7.5  
#11 2014  L  2b   2  4        3   9.5
#12 2014  C  2b   1  2        4  11.5

我需要找到理想的 dplyr 解决方案。另外,它确实需要归纳到 div1中的个体数量可能变化而div2a / div2b中的个体数量变化的年份(尽管length(div2a)== length(div2b)始终)。

I need to find a dplyr solution ideally. Also, it does need to generalize to years where the number of individuals in 'div1' may vary and the number of individuals in div2a/div2b varies (although length(div2a)==length(div2b) always).



This is how I'd do it:

dt = as.data.table(df)

dt[order(-pts, -x), rank.init := 1:.N, by = div]

dt[, div.clean := sub('(\\d+).*', '\\1', div)]
setorder(dt, div.clean, rank.init)

dt[, rank.final := mean(.I), by = .(div.clean, rank.init)]
setorder(dt, div, rank.final)
#    year id div pts  x rank.init div.clean rank.final
# 1: 2014  N   1   9 11         1         1        1.0
# 2: 2014  G   1   9 10         2         1        2.0
# 3: 2014  J   1   7 12         3         1        3.0
# 4: 2014  U   1   3  7         4         1        4.0
# 5: 2014  M  2a   7 12         1         2        5.5
# 6: 2014  E  2a   7  7         2         2        7.5
# 7: 2014  S  2a   5  5         3         2        9.5
# 8: 2014  W  2a   3  4         4         2       11.5
# 9: 2014  D  2b   7  7         1         2        5.5
#10: 2014  B  2b   7  6         2         2        7.5
#11: 2014  L  2b   2  4         3         2        9.5
#12: 2014  C  2b   1  2         4         2       11.5


