问题描述
我正在学习在 Python 中使用流,我注意到 IO 文档 说以下内容:
I'm learning about working with streams in Python and I noticed that the IO docs say the following:
创建二进制流的最简单方法是在模式字符串中使用 open() 和 'b':
f = open("myfile.jpg", "rb")
内存中的二进制流也可用作 BytesIO 对象:
In-memory binary streams are also available as BytesIO objects:
f = io.BytesIO(b"一些初始二进制数据:x00x01")
open
定义的 f
和 BytesIO
定义的 f
有什么区别.换句话说,是什么使内存中的二进制流"成为可能?这与 open
的作用有何不同?
What is the difference between f
as defined by open
and f
as defined by BytesIO
. In other words, what makes a "In-memory binary stream" and how is that different from what open
does?
推荐答案
为了简单起见,让我们暂时考虑写作而不是阅读.
For simplicity's sake, let's consider writing instead of reading for now.
所以当你使用 open()
就像说:
So when you use open()
like say:
with open("test.dat", "wb") as f:
f.write(b"Hello World")
f.write(b"Hello World")
f.write(b"Hello World")
执行后将创建一个名为 test.dat
的文件,其中包含 3 个 Hello World
.数据写入文件后不会保存在内存中(除非通过名称保存).
After executing that a file called test.dat
will be created, containing 3x Hello World
. The data wont be kept in memory after it's written to the file (unless being kept by a name).
现在当你考虑 io.BytesIO()
时:
with io.BytesIO() as f:
f.write(b"Hello World")
f.write(b"Hello World")
f.write(b"Hello World")
它不是将内容写入文件,而是写入内存缓冲区.换句话说,一块内存.基本上编写以下内容是等效的:
Which instead of writing the contents to a file, it's written to an in memory buffer. In other words a chunk of RAM. Essentially writing the following would be the equivalent:
buffer = b""
buffer += b"Hello World"
buffer += b"Hello World"
buffer += b"Hello World"
关于带有 with 语句的例子,最后还有一个 del 缓冲区
.
这里的主要区别在于优化和性能.io.BytesIO
能够进行一些优化,使其比简单地将所有 b"Hello World"
一个一个连接起来更快.
The key difference here is optimization and performance. io.BytesIO
is able to do some optimizations that makes it faster than simply concatenating all the b"Hello World"
one by one.
为了证明这一点,这里有一个小基准:
Just to prove it here's a small benchmark:
- 连续:1.3529 秒
- BytesIO:0.0090 秒
import io
import time
begin = time.time()
buffer = b""
for i in range(0, 50000):
buffer += b"Hello World"
end = time.time()
seconds = end - begin
print("Concat:", seconds)
begin = time.time()
buffer = io.BytesIO()
for i in range(0, 50000):
buffer.write(b"Hello World")
end = time.time()
seconds = end - begin
print("BytesIO:", seconds)
除了性能提升之外,使用 BytesIO
而不是连接的优点是 BytesIO
可以用来代替文件对象.因此,假设您有一个需要写入文件对象的函数.然后你可以给它那个内存缓冲区而不是一个文件.
Besides the performance gain, using BytesIO
instead of concatenating has the advantage that BytesIO
can be used in place of a file object. So say you have a function that expects a file object to write to. Then you can give it that in-memory buffer instead of a file.
区别在于open("myfile.jpg", "rb")
只是加载并返回myfile.jpg
的内容;而 BytesIO
再次只是一个包含一些数据的缓冲区.
The difference is that open("myfile.jpg", "rb")
simply loads and returns the contents of myfile.jpg
; whereas, BytesIO
again is just a buffer containing some data.
由于 BytesIO
只是一个缓冲区 - 如果您想稍后将内容写入文件 - 您必须这样做:
Since BytesIO
is just a buffer - if you wanted to write the contents to a file later - you'd have to do:
buffer = io.BytesIO()
# ...
with open("test.dat", "wb") as f:
f.write(buffer.getvalue())
另外,你没有提到版本;我正在使用 Python 3.与示例相关:我使用 with 语句而不是调用 f.close()
这篇关于二进制流中`open`和`io.BytesIO`的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!