问题描述
我已经看到一些解决类似问题的方法,但是它们都需要对要添加的项数进行迭代。
I've seen a few solutions to similar problems, but they all require iteration over the number of items to be added together.
这是我的目标:从数字列表中,找到所有总计为一定总数的组合(无替代)。例如,如果我有数字 1、1,2、3、5
和总数 5
,则应返回 5
, 2,3
和 1,1,3
。
Here's my goal: from a list of numbers, find all of the combinations (without replacement) that add up to a certain total. For example, if I have numbers 1,1,2,3,5
and total 5
, it should return 5
,2,3
, and 1,1,3
.
我试图使用 combn
,但是它要求您指定每个组合中的项目数。有没有一种方法可以允许任何大小的解决方案集?
I was trying to use combn
but it required you to specify the number of items in each combination. Is there a way to do it that allows for solution sets of any size?
推荐答案
这正是 RcppAlgos (我是作者)构建的> combo / permuteGeneral 。由于我们在样本向量中重复了特定元素,因此我们将找到的组合,符合我们的标准。请注意,这与生成具有重复的组合的更常见的情况不同,在重复的组合中,每个元素被允许重复 m 次。对于许多组合生成函数,多集会带来问题,因为引入了重复项并且必须对其进行处理。如果您的数据量很大,这可能成为代码中的瓶颈。 RcppAlgos
中的函数可以有效地处理这些情况,而不会产生任何重复的结果。我应该提到,还有其他一些很好的库可以很好地处理多集: multicool
和 arrangements
。
This is precisely what combo/permuteGeneral
from RcppAlgos
(I am the author) were built for. Since we have repetition of specific elements in our sample vector, we will be finding combinations of multisets that meet our criteria. Note that this is different than the more common case of generating combinations with repetition where each element is allowed to be repeated m times. For many combination generating functions, multisets pose problems as duplicates are introduced and must be dealt with. This can become a bottleneck in your code if the size of your data is decently large. The functions in RcppAlgos
handle these cases efficiently without creating any duplicate results. I should mention that there are a couple of other great libraries that handle multisets quite well: multicool
and arrangements
.
继续进行当前的任务,我们可以利用 comboGeneral
的约束参数来找到向量的所有组合满足特定条件:
Moving on to the task at hand, we can utilize the constraint arguments of comboGeneral
to find all combinations of our vector that meet a specific criteria:
vec <- c(1,1,2,3,5) ## using variables from @r2evans
uni <- unique(vec)
myRep <- rle(vec)$lengths
ans <- 5
library(RcppAlgos)
lapply(seq_along(uni), function(x) {
comboGeneral(uni, x, freqs = myRep,
constraintFun = "sum",
comparisonFun = "==",
limitConstraints = ans)
})
[[1]]
[,1]
[1,] 5
[[2]]
[,1] [,2]
[1,] 2 3
[[3]]
[,1] [,2] [,3]
[1,] 1 1 3
[[4]]
[,1] [,2] [,3] [,4] ## no solutions of length 4
这些函数经过高度优化,可以很好地扩展到较大的案例。例如,请考虑以下示例,该示例将产生超过3000万个组合:
These functions are highly optimized and extend well to larger cases. For example, consider the following example that would produce over 30 million combinations:
## N.B. Using R 4.0.0 with new updated RNG introduced in 3.6.0
set.seed(42)
bigVec <- sort(sample(1:30, 40, TRUE))
rle(bigVec)
Run Length Encoding
lengths: int [1:22] 2 1 2 3 4 1 1 1 2 1 ...
values : int [1:22] 1 2 3 4 5 7 8 9 10 11 ...
bigUni <- unique(bigVec)
bigRep <- rle(bigVec)$lengths
bigAns <- 199
len <- 12
comboCount(bigUni, len, freqs = bigRep)
[1] 32248100
所有300000+结果都很快返回:
All 300000+ results are returned very quickly:
system.time(bigTest <- comboGeneral(bigUni, len, freqs = bigRep,
constraintFun = "sum",
comparisonFun = "==",
limitConstraints = bigAns))
user system elapsed
0.273 0.004 0.271
head(bigTest)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 1 2 3 4 25 26 26 26 27 28 30
[2,] 1 1 2 3 5 24 26 26 26 27 28 30
[3,] 1 1 2 3 5 25 25 26 26 27 28 30
[4,] 1 1 2 3 7 24 24 26 26 27 28 30
[5,] 1 1 2 3 7 24 25 25 26 27 28 30
[6,] 1 1 2 3 7 24 25 26 26 26 28 30
nrow(bigTest)
[1] 280018
all(rowSums(bigTest) == bigAns)
[1] TRUE
附录
我必须提一提的是,当我遇到如下问题时:找到所有合计为特定数字的组合 我的第一个想法是。例如,在相关问题,我们可以轻松地使用分区
库。但是,这种方法不能扩展到一般情况(如此处所示),在这种情况下,向量包含特定的重复项,或者我们的向量包含不容易转换为等价整数的值(例如,向量(0.1,0.2,0.3,0.4)
可以很容易地视为 1:4
,但是对待 c(3.98486 7.84692 0.0038937 7.4879)
作为整数,随后应用整数分区方法将需要大量的计算能力,从而使此方法无用)。
Addendum
I must mention that generally when I see a problem like: "finding all combinations that sum to a particular number" my first thought is integer partitions. For example, in the related problem Getting all combinations which sum up to 100 in R, we can easily solve with the partitions
library. However, this approach does not extend to the general case (as we have here) where the vector contains specific repetition or we have a vector that contains values that don't easily convert to an integer equivalent (E.g. the vector (0.1, 0.2, 0.3, 0.4)
can easily be treated as 1:4
, however treating c(3.98486 7.84692 0.0038937 7.4879)
as integers and subsequently applying an integer partitions approach would require an extravagant amount of computing power rendering this method useless).
这篇关于查找一组数字的所有组合,这些组合的总和为某个总数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!