本文介绍了我的data.table有多少唯一键?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 给定 data.table ,如何查找其包含的唯一键的数量? library(data.table)z< - data.table(id = c(1,2,1,3),key =id) length unique(z $ id)) ==> 3 问题是 unique em>二次方程,但是,由于 data.table 中的键被排序,应该可以找到解决方案 也许这是: sum(Negate(duplicated)(z $ id)) z $ id保持排序,因此重复的工作速度更快: bigVec system.time(sum(negate(duplicated)(bigVec)))用户系统已过 8.161 0.475 8.690 bigVec< - sort(bigVec) system.time(sum(negate(duplicate)(bigVec)))用户系统已过 0.00 2.09 2.10 但是我只是检查和长度well ... 所以也许有一种检查,如果向量是排序继续(这可以在一个线性时间)。对我来说,这不是二次方程: system.time(length(unique(bigVec)))用户系统已经过 0.000 0.583 0.664 bigVec< - sort(sample(1:100000,20000000,replace = TRUE)) system.time bigVec)))用户系统已过 0.000 1.290 1.242 bigVec< - sort(sample(1:100000,30000000,replace = TRUE)) system .time(length(unique(bigVec)))用户系统已过 0.000 1.655 1.715 Given a data.table, how do I find the number of unique keys it contains?library(data.table)z <- data.table(id=c(1,2,1,3),key="id")length(unique(z$id))==> 3The problem is that unique is quadratic in general, but, since keys in a data.table are sorted, it should be possible to find the number of unique keys in the data.table in linear time. 解决方案 Maybe this:sum(Negate(duplicated)(z$id))z$id remains sorted, so duplicated can work faster on it:bigVec <- sample(1:100000, 30000000, replace=TRUE)system.time( sum(Negate(duplicated)(bigVec)) ) user system elapsed 8.161 0.475 8.690 bigVec <- sort(bigVec)system.time( sum(Negate(duplicated)(bigVec)) ) user system elapsed 0.00 2.09 2.10 But I just checked and length(unique()) works faster on sorted vectors as well...So maybe there is some kind of checking if the vector is sorted going on (which can be done in a linear time). To me this doesn't look to be quadratic:system.time( length(unique(bigVec)) ) user system elapsed 0.000 0.583 0.664 bigVec <- sort(sample(1:100000, 20000000, replace=TRUE))system.time( length(unique(bigVec)) ) user system elapsed 0.000 1.290 1.242 bigVec <- sort(sample(1:100000, 30000000, replace=TRUE))system.time( length(unique(bigVec)) ) user system elapsed 0.000 1.655 1.715 这篇关于我的data.table有多少唯一键?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
10-19 21:27