问题描述
我编写一个应用程序,解析一个大文件,生成大量的数据,并做一些复杂的可视化。由于所有这些数据不能保存在内存中,我做了一些研究,我开始考虑嵌入式数据库作为这个数据的临时容器。
我的问题是:这是解决这个问题的传统方式吗?而且是一个嵌入式数据库(结构化数据除外),通过在内存中只保留一个子集(如缓存)来管理数据,而其余的保存在磁盘上?谢谢。
编辑:澄清:我在写一个桌面应用程序。应用程序将输入Mb大小为100s的文件。读取文件后,应用程序将生成大量的图形,并将其可视化。由于图形可能具有这样大量的节点,它们可能不适合存储器。我应该将它们保存到一个嵌入式数据库,这将照顾只保留内存中的相关数据? (嵌入式数据库是这样做的?),或者我应该编写自己的复杂模块,这是?
我会分享我的经验,让您决定是否有帮助。
如果您需要保留处理源文件的输出,并使用它来产生多个视图,那么您可以 考虑使用嵌入式数据库。使用嵌入式数据库(IMHO)的原因:
- 要利用RDBMS功能(ACID,关系,外键,约束,触发器,汇总...)
- 以更灵活的方式导出数据
- 启用访问外部客户端处理的数据(已知格式)
- 在准备查看时允许更灵活的数据转换
- 目标平台是什么(windows,linux,android,iPhone,PDA )?
- 什么技术基础? (Java,.Net,C,C ++,...)
- 预期或需要设计哪些资源约束? (RAM,CPU,HD空间)
- 您需要考虑哪些操作行为(连接到网络,已断开连接)?
在典型的现代桌面上,有足够的空闲容量来处理大多数操作。在eeePC,PDA和其他便携式设备上,也许不是。在嵌入式设备上,很可能不是。您使用的语言可能具有内置功能,以帮助内存管理 - 也许你可以利用这些。连接方面(有状态/无状态等)可能会影响你在任何给定时刻真正需要保存的内存。
如果你正在处理真的很大的文件,那么你可能会考虑一个流处理方法,所以你只有在内存中的一小部分的整体数据在一个时间 - 但这并不意味着你应该(或不应该)使用嵌入式数据库。直接的文本或二进制文件可以工作(基于记录,基于列,基于行...任何)。
一些数据库将允许更有效的方式交互与数据一旦它被存储 - 它取决于发动机。我发现如果你有很多聚合所需的基础文件(我的意思是你最初从原始源生成的文件),那么RDBMS引擎可以非常有助于简化你的逻辑。其他选项包括构建基本转换,然后添加其他步骤将其转换为每个特定视图的其他临时存储,然后依次处理以呈现到目标(报告?)格式。
只是意识流的反应 - 希望有一点帮助。
编辑:
根据您的进一步澄清,我不确定嵌入式数据库是您想要的方向。你需要做一些简化的假设来渲染你的图或调查方法,如分割(渲染图的部分,然后缓存输出,然后再渲染下一部分)。
I am writing an application, which parses a large file, generates a large amount of data and do some complex visualization with it. Since all this data can't be kept in memory, I did some research and I'm starting to consider embedded databases as a temporary container for this data.
My question is: is this a traditional way of solving this problem? And is an embedded database (other than structuring data) supposed to manage data by keeping in memory only a subset (like a cache), while the rest is kept on disk? Thank you.
Edit: to clarify: I am writing a desktop application. The application will be inputted with a file of size of 100s of Mb. After reading the file, the application will generate a large number of graphs which will be visualized. Since, the graphs may have such a large number of nodes, they may not fit into memory. Should I save them into an embedded database which will take care of keeping only the relevant data in memory? (Do embedded databases do that?), or I should write my own sophisticated module which does that?
Tough question - but I'll share my experience and let you decide if it helps.
If you need to retain the output from processing the source file, and you use that to produce multiple views of the derived data, then you might consider using an embedded database. The reasons to use an embedded database (IMHO):
- To take advantage of RDBMS features (ACID, relationships, foreign keys, constraints, triggers, aggregation...)
- To make it easier to export the data in a flexible manner
- To enable access to your processed data to external clients (known format)
- To allow more flexible transformation of the data when preparing for viewing
Factors which you should consider when making the decision:
- What is the target platform(s) (windows, linux, android, iPhone, PDA)?
- What technology base? (Java, .Net, C, C++, ...)
- What resource constraints are expected or need to be designed for? (RAM, CPU, HD space)
- What operational behaviours do you need to take into account (connected to network, disconnected)?
On the typical modern desktop there is enough spare capacity to handle most operations. On eeePCs, PDAs, and other portable devices, maybe not. On embedded devices, very likely not. The language you use may have build in features to help with memory management - maybe you can take advantage of those. The connectivity aspect (stateful / stateless / etc.) may impact how much you really need to keep in memory at any given point.
If you are dealing with really big files, then you might consider a streaming process approach so you only have in memory a small portion of the overall data at a time - but that doesn't really mean you should (or shouldn't) use an embedded database. Straight text or binary files could work just as well (record based, column based, line based... whatever).
Some databases will allow you more effective ways to interact with the data once it is stored - it depends on the engine. I find that if you have a lot of aggregation required in your base files (by which I mean the files you generate initially from the original source) then an RDBMS engine can be very helpful to simplify your logic. Other options include building your base transform and then adding additional steps to process that into other temporary stores for each specific view, which are then in turn processed for rendering to the target (report?) format.
Just a stream-of-consciousness response - hope that helps a little.
Edit:
Per your further clarification, I'm not sure an embedded database is the direction you want to take. You either need to make some sort of simplifying assumptions for rendering your graphs or investigate methods like segmentation (render sections of the graph and then cache the output before rendering the next section).
这篇关于何时使用嵌入式数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!