问题描述
我想读取一个包含数据的文件,以十六进制格式编码:
I want to read a file with data, coded in hex format:
01ff0aa121221aff110120...etc
文件包含 >100.000 个这样的字节,有些超过 1.000.000(它们来自 DNA 测序)
the files contains >100.000 such bytes, some more than 1.000.000 (they comes form DNA sequencing)
我尝试了以下代码(以及其他类似代码):
I tried the following code (and other similar):
filele=1234563
f=open('data.geno','r')
c=[]
for i in range(filele):
a=f.read(1)
b=a.encode("hex")
c.append(b)
f.close()
这给每个字节单独的aa"01"f1"等,这对我来说是完美的!
This gives each byte separate "aa" "01" "f1" etc, that is perfect for me!
这适用于(在这种情况下)恰好是1a"的第 905 字节.我还尝试了同样停在同一字节的 ord() 函数.
This works fine up to (in this case) byte no 905 that happen to be "1a". I also tried the ord() function that also stopped at the same byte.
可能有一个简单的解决方案?
There might be a simple solution?
推荐答案
简单的解决方案是 binascii:
Simple solution is binascii
:
import binascii
# Open in binary mode (so you don't read two byte line endings on Windows as one byte)
# and use with statement (always do this to avoid leaked file descriptors, unflushed files)
with open('data.geno', 'rb') as f:
# Slurp the whole file and efficiently convert it to hex all at once
hexdata = binascii.hexlify(f.read())
这只是为您提供一个 str
十六进制值,但它比您尝试做的要快得多.如果你真的想要一堆长度为 2 的十六进制字符串为每个字节,你可以很容易地转换结果:
This just gets you a str
of the hex values, but it does it much faster than what you're trying to do. If you really want a bunch of length 2 strings of the hex for each byte, you can convert the result easily:
hexlist = map(''.join, zip(hexdata[::2], hexdata[1::2]))
将生成与每个字节的十六进制编码相对应的 len 2 str
的列表.为了避免 hexdata
的临时副本,您可以使用类似但稍微不那么直观的方法,通过使用 zip
两次使用相同的迭代器来避免切片:
which will produce the list of len 2 str
s corresponding to the hex encoding of each byte. To avoid temporary copies of hexdata
, you can use a similar but slightly less intuitive approach that avoids slicing by using the same iterator twice with zip
:
hexlist = map(''.join, zip(*[iter(hexdata)]*2))
更新:
对于 Python 3.5 及更高版本的用户,bytes
对象产生了一个 .hex()
方法,因此不需要模块将原始二进制数据转换为 ASCII 十六进制.顶部的代码块可以简化为:
For people on Python 3.5 and higher, bytes
objects spawned a .hex()
method, so no module is required to convert from raw binary data to ASCII hex. The block of code at the top can be simplified to just:
with open('data.geno', 'rb') as f:
hexdata = f.read().hex()
这篇关于如何在 Python 中以十六进制形式读取二进制文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!