本文介绍了如何在 Python 中以十六进制形式读取二进制文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想读取一个包含数据的文件,以十六进制格式编码:

I want to read a file with data, coded in hex format:

01ff0aa121221aff110120...etc

文件包含 >100.000 个这样的字节,有些超过 1.000.000(它们来自 DNA 测序)

the files contains >100.000 such bytes, some more than 1.000.000 (they comes form DNA sequencing)

我尝试了以下代码(以及其他类似代码):

I tried the following code (and other similar):

filele=1234563
f=open('data.geno','r')
c=[]
for i in range(filele):
  a=f.read(1)
  b=a.encode("hex")
  c.append(b)
f.close()

这给每个字节单独的aa"01"f1"等,这对我来说是完美的!

This gives each byte separate "aa" "01" "f1" etc, that is perfect for me!

这适用于(在这种情况下)恰好是1a"的第 905 字节.我还尝试了同样停在同一字节的 ord() 函数.

This works fine up to (in this case) byte no 905 that happen to be "1a". I also tried the ord() function that also stopped at the same byte.

可能有一个简单的解决方案?

There might be a simple solution?

推荐答案

简单的解决方案是 binascii:

Simple solution is binascii:

import binascii

# Open in binary mode (so you don't read two byte line endings on Windows as one byte)
# and use with statement (always do this to avoid leaked file descriptors, unflushed files)
with open('data.geno', 'rb') as f:
    # Slurp the whole file and efficiently convert it to hex all at once
    hexdata = binascii.hexlify(f.read())

这只是为您提供一个 str 十六进制值,但它比您尝试做的要快得多.如果你真的想要一堆长度为 2 的十六进制字符串为每个字节,你可以很容易地转换结果:

This just gets you a str of the hex values, but it does it much faster than what you're trying to do. If you really want a bunch of length 2 strings of the hex for each byte, you can convert the result easily:

hexlist = map(''.join, zip(hexdata[::2], hexdata[1::2]))

将生成与每个字节的十六进制编码相对应的 len 2 str 的列表.为了避免 hexdata 的临时副本,您可以使用类似但稍微不那么直观的方法,通过使用 zip 两次使用相同的迭代器来避免切片:

which will produce the list of len 2 strs corresponding to the hex encoding of each byte. To avoid temporary copies of hexdata, you can use a similar but slightly less intuitive approach that avoids slicing by using the same iterator twice with zip:

hexlist = map(''.join, zip(*[iter(hexdata)]*2))

更新:

对于 Python 3.5 及更高版本的用户,bytes 对象产生了一个 .hex() 方法,因此不需要模块将原始二进制数据转换为 ASCII 十六进制.顶部的代码块可以简化为:

For people on Python 3.5 and higher, bytes objects spawned a .hex() method, so no module is required to convert from raw binary data to ASCII hex. The block of code at the top can be simplified to just:

with open('data.geno', 'rb') as f:
    hexdata = f.read().hex()

这篇关于如何在 Python 中以十六进制形式读取二进制文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-13 14:25