问题描述
我知道.net中的所有数组都限制为2 GB,在此前提下,我尽量不分配大于n =((2 ^ 31)-1)/8的双精度数组.但是,该数量的元素似乎仍然无效.有谁知道如何在运行时确定给定sizeof(T)的最大元素数量?
I know that all arrays in .net are limited to 2 GB, under this premise, I try not to allocate more that n = ((2^31) - 1) / 8 doubles in an array. Nevertheless, that number of elements still doesn't seem to be valid. Anyone knows how can I determine at run time the maximum number of elements given sizeof(T)?
我知道,接近该数字的任何数量都只是很多要素,但是出于所有意图和目的,可以说我需要它.
I know that whatever quantity approaching that number is just a lot of elements but, for all intents and purposes, let's say I need it.
注意:我处于64位环境中,具有用于我的AnyCPU应用程序的目标平台,并且RAM中至少有3100 MB可用空间.
Note: I'm in a 64-bit environment, with a target platform for my application of AnyCPU, and at least 3100 MB free in RAM.
更新:谢谢大家的贡献,对不起,我很安静.不便之处,敬请原谅.我无法重述我的问题,但我可以补充一点,我正在寻找的是解决以下问题:
Update:Thank you all for your contributions and sorry I was so quiet. I apologise for the inconvenience. I have not been able to rephrase my question but I can add that, what I am looking for is solving something like this:
template <class T>
array<T>^ allocateAnUsableArrayWithTheMostElementsPossible(){
return gcnew array<T>( ... );
}
我自己的答案中的结果是 kinda 令人满意,但还不够好.此外,我还没有在另一台机器上进行测试(很难找到另一台大于4 GB的机器).此外,我一直在自己做一些研究,似乎没有便宜的方法可以在运行时计算出该值.无论如何,这仅仅是一个优点,我正在尝试实现什么的用户都不会期望使用我尝试实现的功能而没有能力.
The results in my own answer are kinda satisfactory but not good enough. Furthermore, I haven't test it on another machine (Kind of hard finding another machine with more than 4 GB). Besides, I have been doing some research on my own and it seems there's no cheap way to calculate this at run time. Anyhow, that was just a plus, none of the user of what-I-am-trying-to-accomplish can expect to use the feature I am trying to implement without having the capacity.
因此,换句话说,我只是想了解为什么数组的最大元素数量不等于2GB ceteris paribus .我现在需要的只是最高限额.
So, in other words, I just want to understand why the maximum number of elements of an array don't add up to 2GB ceteris paribus. A top maximum is all I need for now.
推荐答案
更新:回答完全重写.原始答案包含通过分而治之找到任何系统上最大可能的可寻址数组的方法,如果您有兴趣,请参阅此答案的历史记录.新答案试图解释56个字节的间隙.
Update: answer COMPLETELY rewritten. Original answer contained methods to find the largest possible addressable array on any system by divide and conquer, see history of this answer if you're interested. The new answer attempts to explain the 56 bytes gap.
在,AZ解释说,最大数组大小被限制为小于2GB上限,并且经过反复试验(或另一种方法?)发现以下内容(摘要)):
In his own answer, AZ explained that the maximum array size is limited to less then the 2GB cap and with some trial and error (or another method?) finds the following (summary):
- 如果类型的大小为1、2、4或8个字节,则最大可容纳大小为2GB-56个字节;
- 如果类型的大小为16个字节,则最大为2GB-48个字节;
- 如果类型的大小为32个字节,则最大为2GB-32个字节.
我不确定16字节和32字节的情况.如果数组是结构数组或内置类型,则该数组的总可用大小可能会有所不同.我将重点介绍1-8字节的类型大小(我也不确定,请参见结论).
I'm not entirely sure about the 16 bytes and 32 bytes situations. The total available size for the array might be different if it's an array of structs or a build-in type. I'll emphasize on 1-8 bytes type size (of which I'm not that sure either, see conclusion).
要了解为什么CLR不允许精确地 2GB/IntPtr.Size
元素,我们需要了解数组的结构.这篇 SO文章是一个很好的起点,但不幸的是,其中一些信息似乎是错误的,或者至少是不完整的.有关.NET CLR如何创建运行时对象的深入文章以及在CodeProject上的未记录的阵列文章被证明是无价的.
To understand why the CLR does not allow exactly 2GB / IntPtr.Size
elements we need to know how an array is structured. A good starting point is this SO article, but unfortunately, some of the information seems false, or at least incomplete. This in-depth article on how the .NET CLR creates runtime objects proved invaluable, as well as this Arrays Undocumented article on CodeProject.
获取了这些文章中的所有信息,可以归结为32位系统中的数组的以下布局:
Taking all the information in these articles, it comes down to the following layout for an array in 32 bit systems:
Single dimension, built-in type
SSSSTTTTLLLL[...data...]0000
^ sync block
^ type handle
^ length array
^ NULL
每个部分的大小为一个系统 DWORD
.在64位窗口上,如下所示:
Each part is one system DWORD
in size. On 64 bit windows, this looks as follows:
Single dimension, built-in type
SSSSSSSSTTTTTTTTLLLLLLLL[...data...]00000000
^ sync block
^ type handle
^ length array
^ NULL
当它是一个对象数组(即字符串,类实例)时,布局看起来略有不同.如您所见,将添加数组中对象的类型句柄.
The layout looks slightly different when it's an array of objects (i.e., strings, class instances). As you can see, the type handle to the object in the array is added.
Single dimension, built-in type
SSSSSSSSTTTTTTTTLLLLLLLLtttttttt[...data...]00000000
^ sync block
^ type handle
^ length array
^ type handle array element type
^ NULL
再来看一遍,我们发现一个内置类型,或者实际上是任何结构类型,都有其自己的特定类型处理程序(所有 uint
都具有相同的类型,但是一个 int
对于数组的类型处理程序不同于 uint
或 byte
).所有对象数组共享相同的类型处理程序,但是有一个额外的字段指向对象的类型处理程序.
Looking further, we find that a built-in type, or actually, any struct type, gets its own specific type handler (all uint
share the same, but an int
has a different type handler for the array then a uint
or byte
). All arrays of object share the same type handler, but have an extra field that points to the type handler of the objects.
有关结构类型的注释:填充可能并不总是适用,这可能会使预测结构的实际大小变得困难.
A note on struct types: padding may not always be applied, which may make it hard to predict the actual size of a struct.
要计入AZ答案的56个字节,我必须作一些假设.我认为:
To count towards the 56 bytes of the AZ's answer, I have to make a few assumptions. I assume that:
- syncblock和类型句柄计入对象的大小;
- 保存数组引用(对象指针)的变量计入对象的大小;
- 数组的空终止符计入对象的大小.
在变量指向的地址之前放置一个同步块,这使它看起来像不是对象的一部分.但实际上,我相信是这样,它也计入内部2GB的限制.对于64位系统,将所有这些相加即可得到:
A syncblock is placed before the address the variable points at, which makes it look like it's not part of the object. But in fact, I believe it is and it counts towards the internal 2GB limit. Adding all these, we get, for 64 bit systems:
ObjectRef +
Syncblock +
Typehandle +
Length +
Null pointer +
--------------
40 (5 * 8 bytes)
还不是 56 .也许有人可以在调试过程中查看内存视图",以检查在64位窗口下数组的布局如何.
Not 56 yet. Perhaps someone can have a look with Memory View during debugging to check how the layout of an array looks like under 64 bits windows.
我的猜测是沿着这些思路(选择,混合和匹配):
My guess is something along these lines (take your pick, mix and match):
-
2GB将永远不可能,因为这是进入下一个段的一个字节.最大块应为
2GB-sizeof(int)
.但这很愚蠢,因为内存索引应该从零开始,而不是一个;
2GB will never be possible, as that is one byte into the next segment. The largest block should be
2GB - sizeof(int)
. But this is silly, as mem indexes should start at zero, not one;
任何大于85016字节的对象都将放在LOH(大对象堆)上.这可能包括一个额外的指针,或者甚至是一个包含LOH信息的16字节结构.也许这算作极限;
Any object larger then 85016 bytes will be put on the LOH (large object heap). This may include an extra pointer, or even a 16 byte struct holding LOH information. Perhaps this counts towards the limit;
对齐:假设objectref不计数(无论如何,它位于另一个mem段中),则总间隙为32个字节.系统很可能会首选32字节边界.重新看一下内存布局.如果起始点需要在32字节边界上,并且在它之前需要空间供同步块使用,则该同步块将在第一个32字节块的末尾结束.像这样:
Aligning: assuming the objectref does not count (it is in another mem segment anyway), the total gap is 32 bytes. It's very well possible that the system prefers 32 byte boundaries. Take a new look at the memory layout. If the starting point needs to be on a 32 byte boundary, and it needs room for the syncblock before it, the syncblock will end up in the end of the first 32 bytes block. Something like this:
XXXXXXXXXXXXXXXXXXXXXXXXSSSSSSSSTTTTTTTTLLLLLLLLtttttttt[...data...]00000000
其中 XXX ..
代表跳过的字节.
多维数组:如果使用具有1个或多个维度的 Array.CreateInstance
动态创建数组,则将创建一个带有两个DWORD的单个暗数组,这些DWORD包含大小和下限的尺寸(即使只有一个尺寸,但仅当下限指定为非零时也是如此).我发现这种情况不太可能发生,因为如果您的代码中是这种情况,您可能会提到这一点.但这将使总数增加到56个字节;).
multi dimensional arrays: if you create your arrays dynamically with Array.CreateInstance
with 1 or more dimensions, a single dim array will be created with two extra DWORDS containing the size and the lowerbound of the dimension (even if you have only one dimension, but only if the lowerbound is specified as non-zero). I find this highly unlikely, as you would probably have mentioned this if this were the case in your code. But it would bring the total to 56 bytes overhead ;).
从这次小型研究中收集的所有信息中,我认为开销+对齐-Objectref
是最可能也是最合适的结论.然而,真实的"描述是真实的".CLR专家也许可以为这个奇特的主题提供更多的启示.
From all I gathered during this little research, I think that the Overhead + Aligning - Objectref
is the most likely and most fitting conclusion. However, a "real" CLR guru might be able to shed some extra light on this peculiar subject.
这些结论都不能解释为什么16或32字节数据类型分别具有48和32字节的间隙.
None of these conclusions explain why 16 or 32 byte datatypes have a 48 and 32 byte gap respectively.
感谢具有挑战性的主题,我从中学到了一些东西.也许有些人在发现这个新答案与问题更相关时(我最初误解了,并为可能造成的混乱道歉),可以取消投票.
Thanks for a challenging subject, learned something along my way. Perhaps some people can take the downvote off when they find this new answer more related to the question (which I originally misunderstood, and apologies for the clutter this may have caused).
这篇关于我怎么知道给定类型的.net数组可以分配的实际最大元素数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!