本文介绍了列值排名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想按列对值进行排名。
I want to rank values column-wise.
我有以下数据框:
dput(test)
structure(list(Name = c("A", "B", "C", "D"), Margin = c(744,
3196.4722, 0, 394), T1 = c(420, 200, 2150, 70), T2 = c(630, 285,
2365, 84), T3 = c(630, 335, 2580, 105), T4 = c(666, 410, 2795,
128), T5 = c(2244, 2961.7931, 3010, 142), T6 = c(2244, 3652.472,
3440, 151), T7 = c(2244, 3722.472, 3870, 168), T8 = c(2244, 3887.472,
5160, 187), T9 = c(2244, 4112.472, 6450, 225), T10 = c(2244,
4337.472, 6450, 225), T11 = c(798, 3567.472, 4300, 112), T12 = c(630,
3582.472, 4300, 111), T13 = c(702, 3582.472, 4300, 112), T14 = c(3600,
4637.472, 3440, 78), T15 = c(744, 3067.306, 2580, 274), T16 = c(744,
2770.5666, 2580, 197), T17 = c(744, 3138.806, 2580, 80), T18 = c(2244,
3920.0836, 3870, 401), T19 = c(2244, 2789.1117, 1290, 127)), .Names = c("Name",
"Margin", "T1", "T2", "T3", "T4", "T5", "T6", "T7", "T8", "T9",
"T10", "T11", "T12", "T13", "T14", "T15", "T16", "T17", "T18",
"T19"), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))
每行都有唯一的名称中的ID,我想对列进行排名,以确定哪一列等于或小于margin列中的值。
Each row has unique ID in name, and I want to rank the columns to determine which column is equal or least small to the value in the margin column.
理想的输出为:
Name Margin Closest_Column
A 744.000 T15
断裂带可能是随机的。
Break ties could be random.
我的尝试:
nm1 <- paste("rank", names(test)[3:21], sep="_")
test[nm1] <- mutate_all(test[3:21],funs(rank(., ties.method="first")))
推荐答案
如果需要使用 tidyverse
,一种方法是 rowwise
,然后找到保证金与其他列之间的最小差值的索引以获取列名
If we need to use tidyverse
, one approach is rowwise
and then find the index of the minimum difference between the 'Margin' and other columns to get the column names
test %>%
rowwise() %>%
do(data.frame(.[1:2], Closest_column = names(.)[3:21][which.min(abs(.[[2]]-
unlist(.[3:21])))]))
# A tibble: 4 x 3
# Name Margin Closest_column
#* <chr> <dbl> <chr>
#1 A 744.000 T15
#2 B 3196.472 T17
#3 C 0.000 T19
#4 D 394.000 T18
或者另一个选择是
Or another option is
library(tidyverse)
gather(test, Closest_column, val, T1:T19) %>%
group_by(Name) %>%
slice(which.min(abs(Margin - val))) %>%
select(-val)
# A tibble: 4 x 3
# Groups: Name [4]
# Name Margin Closest_column
# <chr> <dbl> <chr>
#1 A 744.000 T15
#2 B 3196.472 T17
#3 C 0.000 T19
#4 D 394.000 T18
使用 base R
,有效的选择是 max.col
cbind(test[1:2],
Closest_column = names(test)[3:21][max.col(-abs(test[3:21]-test[[2]]), 'first')])
# Name Margin Closest_column
#1 A 744.000 T15
#2 B 3196.472 T17
#3 C 0.000 T19
#4 D 394.000 T18
这篇关于列值排名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!