本文介绍了提取r中不同值的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想重新创建此示例中展示的提取排序后的唯一值的最快方法的示例:最快的获取方法是什么来自data.table的排序后的唯一值的向量?
I wanted to recreate the example of the fastest method of extracting sorted unique values demonstrated in this post: What is the fastest way to get a vector of sorted unique values from a data.table?
test_df <-
data.frame(
company = c(1, 1, 2, 2, 3)
)
unique_values = df[,logical(1), keyby = company]$company
但是我不断收到错误消息:
But I keep getting error:
编辑.请注意,我的问题的重点是使这种特定方法起作用.有关实现该目标的其他方法的建议,请关注我所引用的帖子.
Edit. Note that the focus of my question is to get this specific method to work. For proposals of other methods which achieve the goal, please follow the post to which I refer.
推荐答案
如果您正在寻找快速的 unique
,请查看 kit :: funique
:
In case you are looking for a fast unique
have a look at kit::funique
:
setDTthreads(1)
microbenchmark::microbenchmark(
y[,logical(1), keyby = company]$company,
unique(x$company),
funique(x$company)
)
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# y[, logical(1), keyby = company]$company 12.151625 12.436920 13.506817 12.58519 12.76036 97.318758 100 b
# unique(x$company) 12.932633 13.145706 13.717273 13.33529 14.54441 15.511965 100 b
# funique(x$company) 2.403889 2.659345 2.748425 2.72396 2.78017 3.507635 100 a
setDTthreads(4)
microbenchmark::microbenchmark(
y[,logical(1), keyby = company]$company,
unique(x$company),
funique(x$company)
)
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# y[, logical(1), keyby = company]$company 5.038178 5.144970 5.907699 5.210202 6.804902 12.671440 100 b
# unique(x$company) 12.961273 13.136794 13.700900 13.315550 14.256065 21.449808 100 c
# funique(x$company) 2.604594 2.667491 2.738920 2.717532 2.786240 3.115353 100 a
数据和库:
set.seed(42)
n <- 1e6
company <- c("A", "S", "W", "L", "T", "T", "W", "A", "T", "W")
item <- c("Thingy", "Thingy", "Widget", "Thingy", "Grommit",
"Thingy", "Grommit", "Thingy", "Widget", "Thingy")
sales <- c(120, 140, 160, 180, 200, 120, 140, 160, 180, 200)
x <- data.frame(company = sample(company, n, TRUE),
item = sample(item, n, TRUE),
sales = sample(sales, n, TRUE))
library(data.table)
y <- as.data.table(x)
library(kit)
这篇关于提取r中不同值的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!