mmap 与 fileinput 的优点

本文介绍了mmap 与 fileinput 的优点的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我读到 mmap 比 fileinput 有优势，因为它将页面读入内核页面缓存并在用户地址空间中共享页面.而 fileinput 实际上将一个页面带入内核并将一行复制到用户地址空间.所以，文件输入有额外的空间开销.

I read that mmap is advantageous than fileinput, because it will read a page into kernel pagecache and shares the page in user address space. Whereas, fileinput actually brings a page into kernel and copies a line to user address space. So, there is this extra space overhead with fileinput.

所以，我打算迁移到 mmap，但我想从高级 python 黑客那里知道它是否可以提高性能?

So, I am planning to move to mmap, but I want to know from advanced python hackers whether it improves performance?

如果有，是否有类似的使用 mmap 的 fileinput 实现?

If so, is there a similar implementation of fileinput that uses mmap?

如果您知道，请指出任何开源代码.

Please point me to any opensource code, if you are aware of.

谢谢

推荐答案

mmap 获取一个文件并将其粘贴在 RAM 中，以便您可以像字节数组或大数据结构一样对其进行索引.

mmap takes a file and sticks it in RAM so that you can index it like an array of bytes or as a big data structure.

如果您以随机访问"方式访问文件，速度会快很多——即执行大量 fseek()、fread()、fwrite() 组合.

Its a lot faster if you are accessing your file in a "random-access" manner -- that is doing a lot of fseek(), fread(), fwrite() combinations.

但是如果您只是读取文件并处理每一行一次(比如说)，那么它不太可能明显更快.事实上，对于任何合理的文件大小(记住，对于 mmap，它都必须适合 RAM —— 否则会发生分页，这会开始降低 mmap 的效率)它可能无法区分.

But if you are just reading the file in and processing each line once (say), then it is unlikely to be significantly faster. In fact, for any reasonable file size (remember with mmap it all must fit in RAM -- or paging occurs which begins to reduce the efficiency of mmap) it probably is indistinguishable.

这篇关于mmap 与 fileinput 的优点的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！