使用数组字段，而不是对象的数量庞大

本文介绍了使用数组字段，而不是对象的数量庞大的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在这篇文章，我想知道是什么人的经验与使用数组来存储数据存储领域的大规模数据集（比如> 10,000,000对象）的内存，而不是实例数百万个对象，并费尽了内存开销（例如12每个对象-24个字节，取决于你读它的文章）。每个属性的数据，从项目不同而不同项目，所以我不能用严格的享元模式，但会想象类似的东西。

我的这种代表性的想法是一个具有一个模板对象......

 类的事情
 {
双A; 
双B： 
 INT℃; 
串D组; 
}

和则与创建要求对象的方法的容器对象..

 类ContainerOfThings 
 {
双[] ContainerA; 
双击[] ContainerB; 
 INT [] ContainerC; 
的String [] ContainerD; 
 
 ContainerOfThings（INT计）
 {
 //创建数组
} 
 
 IThing GetThingAtPosition（INT位置）
 { 
 IThing的事情=新事物（）; //可能是最好的一个工厂完成，而不是
 thing.A = ContainerA [位置] 
 thing.B = ContainerB [位置] 
 thing.C = ContainerC [位置] 
 thing.D = ContainerD [位置] 
 
回报的事情; 
} 
}

所以这是一个简单的策略，但不是非常灵活，对于例如，一个人不能没有重复的数据，战胜的阵列存储领域的目的建立东西的一个子集（作为列表）。我一直没能找到很好的例子，所以我会很感激的更好的方式来处理从别人谁做到了......或者一个更好的主意，这种情况下无论是链接或代码片断。

解决方案

我想有几种方法来接近这个，确实你到一个可能的解决方案，以限制在存储器中的数据。不过，我不知道，甚至通过减少24你的结构？字节是要干什么你一大堆的好。你的结构是围绕79个字节（为一个15字符的字符串）= 8 + 8 + 4 + 24？ + 4 + 1 +（2 *字符长度），所以你的总增益充其量是25％。这似乎并不十分有用，因为你必须要在千万* 80个字节适合在内存和1000万* 100字节没有位置。这将意味着你的设计解决方案，是对灾难的边缘，太多的大串，或者太多记录，或者一些其他程序占用的内存和您的机器是内存不足。

如果您需要支持n个小型的记录，其中n = 10万元，那么你的目标应该是设计了至少2N或10N随机访问。也许你已经在考虑这个你千万？无论哪种方式，有很多技术，可以支持这种类型的被访问的数据。

一种可能性是，如果字符串中最大长度（毫升）限制在合理的，大小（比如255），那么你可以去一个简单的ISAM店。每个记录是8 + 8 + 4 + 255字节，你可以简单地偏移到平面文件阅读。如果记录大小是可变的或可能较大，那么你将要使用这个和存储偏移不同的存储格式到文件中。

另一种可能性是，如果你的查找值由一些关键的话，我会建议类似的嵌入式数据库，或B树，一个你可以禁用某些磁盘一致性来获得的性能。碰巧的是，我写了大量数据的客户端缓存一个BPlusTree。 rel=\"nofollow\">使用B +树在这里在

In light of this article, I am wondering what people's experiences are with storing massive datasets (say, >10,000,000 objects) in-memory using arrays to store data fields instead of instantiating millions of objects and racking up the memory overhead (say, 12-24 bytes per object, depending which article you read). Data per property varies from item to item so I can't use a strict Flyweight pattern but would envision something similar.

My idea of this sort of representation is that one has a 'template object'...

class Thing
{
  double A;
  double B;
  int    C;
  string D;
}

And then a container object with a method of creating an object on request...

class ContainerOfThings
{
  double[] ContainerA;
  double[] ContainerB;
  int[]    ContainerC;
  string[] ContainerD;

  ContainerOfThings(int total)
  {
    //create arrays
  }

  IThing GetThingAtPosition(int position)
  {
     IThing thing = new Thing(); //probably best done as a factory instead
     thing.A = ContainerA[position];
     thing.B = ContainerB[position];
     thing.C = ContainerC[position];
     thing.D = ContainerD[position];

     return thing;
  }
}

So that's a simple strategy but not very versatile, for example one can't create a subset (as a List) of 'Thing' without duplicating data and defeating the purpose of array field storage. I haven't been able to find good examples, so I would appreciate either links or code snippets of better ways to handle this scenario from someone who's done it...or a better idea.

解决方案

I guess there are several ways to approach this, and indeed you are onto a possible solution to limit the data in memory. However, I'm not sure that reducing your structure by even 24? bytes is going to do you a whole lot of good. Your structure is around 79 bytes (for a 15 char string) = 8 + 8 + 4 + 24? + 4 + 1 + (2 * character length) so your total gain is at best 25%. That doesn't seem very useful since you'd have to be in a position where 10 million * 80 bytes fits in memory and 10 million * 100 bytes does not. That would mean that your designing a solution that is on the edge of disaster, too many large strings, or too many records, or some other program hogging memory and your machine is out of memory.

If you need to support random access to n small records, where n = 10 million, then you should aim to design for at least 2n or 10n. Perhaps your already considering this in your 10 million? Either way there are plenty of technologies that can support this type of data being accessed.

One possibility is if the string is limited in Max Length (ml), of a reasonable size (say 255) then you can go to a simple ISAM store. Each record would be 8 + 8 + 4 + 255 bytes and you can simply offset into a flat file to read them. If the record size is variable or possibly large then you will want to use a different storage format for this and store offsets into the file.

Another possibility is if your looking up values by some key then I would recommend something like an embedded database, or BTree, one you can disable some of the disk consistency to gain the performance. As it happens I wrote a BPlusTree for client-side caches of large volumes of data. Detailed information on using the B+Tree are here.

这篇关于使用数组字段，而不是对象的数量庞大的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！