python - 有没有更好的方法来处理python中的文件编码？

我有一些带有不同未知编码的文本文件。现在，我必须打开一个二进制文件来首先检测编码，然后使用该编码再次打开它。

  bf = open(f, 'rb')
  code = chardet.detect(bf.read())['encoding']
  print(f + ' : ' + code)
  bf.close()
  with open(f, 'r', encoding=code) as source:
    texts = extractText(source.readlines())
  source.close()
  with open(splitext(f)[0] + '_texts.txt', 'w', encoding='utf-8') as dist:
    dist.write('\n\n'.join('\n'.join(x) for x in texts))
  dist.close()

那么有没有更好的方法来解决这个问题呢？

最佳答案

无需重新打开和重新读取文件，您只需解码已阅读的文本即可：

with open(filename, 'rb') as fileobj:
    binary = fileobj.read()
probable_encoding = chardet.detect(binary)['encoding']
text = binary.decode(probable_encoding)

关于python - 有没有更好的方法来处理python中的文件编码？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/46202860/