本文介绍了将许多二进制文件传输到SQL Server数据库的有效方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我们要求Winforms应用程序从本地文件系统(或网络位置)读取数以千计的文件并将它们存储在数据库中。
我想知道什么是最有效的方式来加载文件?总共可能有几千兆字节的数据。
目前使用File.ReadAllBytes
,但应用程序最终锁定因为计算机的内存已经用完了。
当前的代码循环遍历一个包含文件路径的表,用于读取二进制数据:
protected CustomFile ConvertFile(string path)
{
try
{
byte [] file = .ReadAllBytes(路径);
返回新的CustomFile {FileValue = file};
}
catch
{
return null;
然后将数据保存到数据库(SQL Server 2008 R2 or 2012)使用NHibernate作为ORM。
解决方案
首先,让我说我的知识是在NET 4.0之前,所以这个信息可能已经过时,因为我知道他们将在这方面做出改进。
不要使用来读取大文件(大于85kb),特别是当您对多个文件顺序执行时。我再说一遍,不要。
使用类似于流和来缓冲你的阅读。即使这听起来效率不高,因为你不会通过一个单独的缓冲区来激发CPU,但是如果你使用ReadAllBytes来完成,那么它就不会像你所发现的那样工作。
原因是因为ReadAllBytes读取字节数组中的所有东西。如果这个字节数组的内存大于85Kb(还有其他一些考虑因素,比如#个数组元素),它将进入大对象堆,这很好,但是,LOH不会移动内存,也不会碎片化释放的空间,所以,简化,这可能会发生:
- 读取1GB文件,在LOH中有1GB块,保存文件。 (没有GC循环)
- 读取1.5GB文件,请求1.5GB大小的内存,进入LOH结尾,但是说你得到一个GC循环,所以1GB大块你以前使用得到清除,但现在你有一块2.5GB的内存,第一个1GB空闲。
- 读取一个1.6GB的文件,在开始的1GB空闲块不起作用,所以分配器到最后。现在你有4.1GB的内存。
- 重复。
的内存,但你肯定没有真正使用这一切,碎片可能会杀了你。如果文件非常大(我认为Windows 32位的进程空间是2GB?),你实际上可以打到一个真正的OOM情况。
如果文件不是订购或相互依赖,也许有几个线程通过缓存与BinaryReader读取将完成工作。
参考文献:
We have a requirement for a Winforms app to read thousands of files from a local filesystem (or a network location) and store them in a database.
I am wondering what would be the most efficient way to load the files? There could potentially be many gigabytes of data in total.
File.ReadAllBytes
is currently used but the application eventually locks up as the computer's memory is used up.
The current code loops through a table containing file paths, which are used to read the binary data:
protected CustomFile ConvertFile(string path)
{
try
{
byte[] file = File.ReadAllBytes(path);
return new CustomFile { FileValue = file };
}
catch
{
return null;
}
}
The data is then saved to the database (either SQL Server 2008 R2 or 2012) using NHibernate as ORM.
解决方案
First, let me state that my knowledge is pre NET 4.0 so this information may be outdated because I know they were going to make improvements in this area.
Do not use File.ReadAllBytes to read large files (larger than 85kb), specially when you are doing it to many files sequentially. I repeat, do not.
Use something like a stream and BinaryReader.Read instead to buffer your reading. Even if this may sound not efficient since you won't blast the CPU through a single buffer, if you do it with ReadAllBytes it simply won't work as you discovered.
The reason for that is because ReadAllBytes reads the whole thing inside a byte array. If that byte array is >85Kb in mem (there are other considerations like # of array elements) it is going into the Large Object Heap, which is fine, BUT, LOH doesn't move memory around, nor defragments the released space, so, simplifying, this can happen:
- Read 1GB file, you have a 1GB chunk in the LOH, save the file. (No GC cycle)
- Read 1.5GB file, you request a 1.5GB chunk of memory, it goes into the end of the LOH, but say you get a GC cycle so the 1GB chunk you previously used gets cleared, but now you have a chunk of 2.5GB memory, the first 1GB free.
- Read a 1.6GB file, the 1GB free block at the beginning doesn't work, so the allocator goes to the end. Now you have a 4.1GB chunk of memory.
- Repeat.
You are running out of memory but you surely aren't actually using it all, fragmentation is probably killing you. Also you can actually hit a real OOM situation if the file is very large (I think the process space in Windows 32 bit is 2GB?).
If files aren't ordered or dependent on each other maybe a few threads reading them by buffering with a BinaryReader would get the job done.
References:
http://www.red-gate.com/products/dotnet-development/ants-memory-profiler/learning-memory-management/memory-management-fundamentals
https://www.simple-talk.com/dotnet/.net-framework/the-dangers-of-the-large-object-heap/
这篇关于将许多二进制文件传输到SQL Server数据库的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!