本文介绍了在生成.NET的所有整数随机,非重复序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有.NET的方式产生的全部 32位整数(的Int32 )以随机顺序,无需重复序列和在一个存储器有效的方式? 。内存效率将意味着最多可使用主内存只有几百兆字节



在理想情况下的顺序应该是这样的一个的IEnumerable< INT> ,它懒洋洋地返回序列中,要求只有当



我做了一些快速的研究,我发现了一些局部的解决方案,下一个号码。这样:




  • 使用 - 如果我理解正确的,它只是在递增序列生成的数字,并没有覆盖整个范围

  • 使用的或其它洗牌算法对集合 - 这将违反内存限制给出的大范围的

  • 评论。我试图同时使用 BitArray .NET和我的定制实现,因为.NET一是仅限于 int.MaxValue 项,所以不足以支付整数的整个范围。



    我喜欢这个主意的简单,我愿意牺牲的512 MB的内存,如果它工作得很好。不幸的是,运行时速度很慢,花费达几十秒钟生成我的机器,它有一个3.5 GHz的酷睿i7 CPU上的下一个随机数。所以很遗憾,如果你要求要产生很多很多的随机数,这是不能接受的。我想这是可预见的,虽然,这是一个O(M×N个)的算法,如果我没有记错,其中N为2 ^ 32,M为要求整数的数量,因此所有这些迭代产生一定的影响。



    在理想情况下,我想生成O(1)时间的下一个随机数,同时还能满足对内存的需求,也许下一个算法这里建议适合于这一点。我会尽快,我可以给他们一个尝试。



    更新3



    我刚刚测试过的的的,我可以说我很满意的结果。它看起来像在这个线程的赢家位置的有力争夺者。



    正确性:生成的只出现一次的所有整数(我用了一下向量。检查)



    随机性:还算不错



    内存使用:太好了,只有几个字节



    运行时间:生成一个随机整数非常快的,因为你可以期待从O(1)算法。生成每个整数共有约拿了。 11秒我的机器上。



    总之,我会说这是一个非常合适的技术,如果你不是在找高度随机的序列。



    更新4



    模块化的删除其与梅森难题的答案,恕不另行通知后,我在一家中指出评论说,我发现它产生比要求的重复号码越快。所以我想这完全排除了梅森倍捻机。



    更新6



    我测试了另一种建议的技术,看起来很有希望,,而我真正喜欢的随机数的质量,算法不适合用于产生在时间上可接受的量的整数的整个范围。所以很遗憾相比,能够完成过程中的其他技术,当它落在短。我用C#的实现从,顺便说一句 - 我改变了代码数量减少轮1的,但它仍然不能及时完成。



    毕竟,由上述结果来看,对于溶液我个人的选择进到 的技术,其次紧密的的。有些人可能会认为这是在某些方面与其他技术逊色,但考虑到我原来的限制,我会说这最适合他们。


    解决方案

    Actually, this can be done in most any language

    Yes.

    Here we need to agree on terminology since "random" is not what most people think it is. More on this in a moment.

    Yes.

    Yes.

    Ok, so would using almost no memory be acceptable? ;-)

    Before getting to the suggestion, we need to clear up the matter of "randomness". Something that is truly random has no discernible pattern. Hence, running the algorithm millions of times in a row could theoretically return the same value across all iterations. If you throw in the concept of "must be different from the prior iteration", then it is no longer random. However, looking at all of the requirements together, it seems that all that is really being asked for is "differing patterns of distribution of the integers". And this is doable.

    So how to do this efficiently? Make use of Modular multiplicative inverses. I used this to answer the following Question which had a similar requirement to generate non-repeating, pseudo-random, sample data within certain bounds:

    Generate different random time in the given interval

    I first learned about this concept here ( generate seemingly random unique numeric ID in SQL Server ) and you can use either of the following online calculators to determine your "Integer" and "Modular Multiplicative Inverses (MMI)" values:

    Applying that concept here, you would use Int32.MaxSize as the Modulo value.

    This would give a definite appearance of random distribution with no chance for collisions and no memory needed to store already used values.

    The only initial problem is that the pattern of distribution is always the same given the same "Integer" and "MMI" values. So, you could come up with differing patterns by either adding a "randomly" generated Int to the starting value (as I believe I did in my answer about generating the sample data in SQL Server) or you can pre-generate several combinations of "Integer" and corresponding "MMI" values, store those in a config file / dictionary, and use a .NET random function to pick one at the start of each run. Even if you store 100 combinations, that is almost no memory use (assuming it is not in a config file). In fact, if storing both as Int and the dictionary uses Int as an index, then 1000 values is approximately 12k?


    UPDATE

    Notes:

    • There is a pattern in the results, but it is not discernible unless you have enough of them at any given moment to look at in total. For most use-cases, this is acceptable since no recipient of the values would have a large collection of them, or know that they were assigned in sequence without any gaps (and that knowledge is required in order to determine if there is a pattern).
    • Only 1 of the two variable values -- "Integer" and "Modular Multiplicative Inverse (MMI)" -- is needed in the formula for a particular run. Hence:
      • each pair gives two distinct sequences
      • if maintaining a set in memory, only a simple array is needed, and assuming that the array index is merely an offset in memory from the base address of the array, then the memory required should only be 4 bytes * capacity (i.e. 1024 options is only 4k, right?)

    Here is some test code. It is written in T-SQL for Microsoft SQL Server since that is where I work primarily, and it also has the advantage of making it real easy-like to test for uniqueness, min and max values, etc, without needing to compile anything. The syntax will work in SQL Server 2008 or newer. For SQL Server 2005, initialization of variables had not been introduced yet so each DECLARE that contains an = would merely need to be separated into the DECLARE by itself and a SET @Variable = ... for however that variable is being initialized. And the SET @Index += 1; would need to become SET @Index = @Index + 1;.

    The test code will error if you supply values that produce any duplicates. And the final query indicates if there are any gaps since it can be inferred that if the table variable population did not error (hence no duplicates), and the total number of values is the expected number, then there could only be gaps (i.e. missing values) IF either or both of the actual MIN and MAX values are outside of the expected values.

    PLEASE NOTE that this test code does not imply that any of the values are pre-generated or need to be stored. The code only stores the values in order to test for uniqueness and min / max values. In practice, all that is needed is the simple formula, and all that is needed to pass into it is:

    • the capacity (though that could also be hard-coded in this case)
    • the MMI / Integer value
    • the current "index"

    So you only need to maintain 2 - 3 simple values.

    DECLARE @TotalCapacity INT = 30; -- Modulo; -5 to +4 = 10 OR Int32.MinValue
                                     -- to Int32.MaxValue = (UInt32.MaxValue + 1)
    DECLARE @MMI INT = 7; -- Modular Multiplicative Inverse (MMI) or
                          -- Integer (derived from @TotalCapacity)
    
    DECLARE @Offset INT = 0; -- needs to stay at 0 if min and max values are hard-set
    -----------
    DECLARE @Index INT = (1 + @Offset); -- start
    
    DECLARE @EnsureUnique TABLE ([OrderNum] INT NOT NULL IDENTITY(1, 1),
                                 [Value] INT NOT NULL UNIQUE);
    SET NOCOUNT ON;
    
    BEGIN TRY
        WHILE (@Index < (@TotalCapacity + 1 + @Offset)) -- range + 1
        BEGIN
            INSERT INTO @EnsureUnique ([Value]) VALUES (
                     ((@Index * @MMI) % @TotalCapacity) - (@TotalCapacity / 2) + @Offset
                                                       );
            SET @Index += 1;
        END;
    END TRY
    BEGIN CATCH
        DECLARE @Error NVARCHAR(4000) = ERROR_MESSAGE();
        RAISERROR(@Error, 16, 1);
        RETURN;
    END CATCH;
    
    SELECT * FROM @EnsureUnique ORDER BY [OrderNum] ASC;
    SELECT COUNT(*) AS [TotalValues],
           @TotalCapacity AS [ExpectedCapacity],
           MIN([Value]) AS [MinValue],
           (@TotalCapacity / -2) AS [ExpectedMinValue],
           MAX([Value]) AS [MaxValue],
           (@TotalCapacity / 2) - 1 AS [ExpectedMaxValue]
    FROM   @EnsureUnique;
    

    这篇关于在生成.NET的所有整数随机,非重复序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-27 04:30
查看更多