问题描述
或者,也许是一个更好的标题:如何在将二进制文件传递给文本模式写入子句时避免不需要的额外回车.
Python 3.6,Windows.输入文件需要先进行二进制搜索/替换,然后进行正则表达式搜索/替换.
我首先以二进制模式打开输入文件,做工作,然后将它以二进制模式保存在一个临时文件中.然后我在文本模式下打开它,进行正则表达式搜索/替换,并将其保存在文本模式下(名称类似于输入文件的名称).
def fixbin(infile):使用 open(infile, 'rb') 作为 f:文件 = f.read()# 这里进行一些字节数组操作,然后:with open('bin.tmp', 'wb') as f:f.写(文件)def fix4801(fname, ext):outfile = '{}_OK{}'.format(fname, ext)with open('bin.tmp', encoding='utf-8-sig', mode='r') as f, \open(outfile, encoding='utf-8-sig', mode='w') as g:infile = f.read()x = re.sub(r'(\n4801.+\n)4801', r'\1 ', infile)g.write(y)infile, fname, ext = get_infile() # 为简洁起见,未显示函数 get_infilefixbin(infile)fix4801(fname, ext)
它有效,但它很丑陋.我宁愿将输出作为文件传递,如下所示:
def fixbin(infile):使用 open(infile, 'rb') 作为 f:文件 = f.read()# 这里进行一些字节数组操作,然后返回 file.decode('utf-8')def fix4801(infile):x = re.sub(r'(\n4801.+\n)4801', r'\1 ', infile)返回 x...temp = fixbin(infile)结果 = fix4801(temp)outfile = '{}_OK{}'.format(fname, ext)使用 open(outfile, encoding='utf-8-sig', mode='w') 作为 g:g.写(结果)
但是随后输出文件 (Windows) 获得了不需要的额外回车符.此处描述了症状,但原因不同:我我没有使用 os.linesep
,换句话说,我的代码中没有 os.linesep.(底层库中可能有,我没查过.)
我做错了什么?
默认:newline=None
,如果换行符是 ''
或 '\n'
,不进行翻译.
如果有任何不同,请尝试以下操作:
#changeopen(outfile, encoding='utf-8-sig', mode='w') as g:#和open(outfile, encoding='utf-8-sig', mode='w', newline='') as g:
问题:...我的代码中没有 os.linesep.
Python » 文档:open
将输出写入流时,如果换行符为 None,则写入的任何 '\n' 字符将转换为系统默认行分隔符 os.linesep.如果换行符是 '' 或 '\n',则不进行转换.如果换行符是任何其他合法值,则写入的任何 '\n' 字符都将转换为给定的字符串.
Or, perhaps a better title: how to avoid unwanted extra carriage return when passing binary file to text mode write clause.
Python 3.6, Windows. Input file needs to undergo first a binary search/replace, and then a regex search/replace.
I first open the input file in binary mode, do the work, and save it in binary mode in a temporary file. Then I open that in text mode, do the regex search/replace, and save it in text mode (with a name resembling that of the input file).
def fixbin(infile):
with open(infile, 'rb') as f:
file = f.read()
# a few bytearray operations here, then:
with open('bin.tmp', 'wb') as f:
f.write(file)
def fix4801(fname, ext):
outfile = '{}_OK{}'.format(fname, ext)
with open('bin.tmp', encoding='utf-8-sig', mode='r') as f, \
open(outfile, encoding='utf-8-sig', mode='w') as g:
infile = f.read()
x = re.sub(r'(\n4801.+\n)4801', r'\1 ', infile)
g.write(y)
infile, fname, ext = get_infile() # function get_infile not shown for brevity
fixbin(infile)
fix4801(fname, ext)
It works but it's ugly. I'd rather pass outputs as files, like so:
def fixbin(infile):
with open(infile, 'rb') as f:
file = f.read()
# a few bytearray operations here, and then
return file.decode('utf-8')
def fix4801(infile):
x = re.sub(r'(\n4801.+\n)4801', r'\1 ', infile)
return x
...
temp = fixbin(infile)
result = fix4801(temp)
outfile = '{}_OK{}'.format(fname, ext)
with open(outfile, encoding='utf-8-sig', mode='w') as g:
g.write(result)
But then the output file (Windows) gets an unwanted extra carriage return. The symptoms are described here, but the cause is different: I'm not using os.linesep
, in other words there is no os.linesep in my code. (there may be in the underlying libraries, I haven't checked.)
What am I doing wrong?
Default: newline=None
, If newline is ''
or '\n'
, no translation takes place.
Try the following if it makes any different:
#change
open(outfile, encoding='utf-8-sig', mode='w') as g:
#with
open(outfile, encoding='utf-8-sig', mode='w', newline='') as g:
这篇关于(Python 3) 如何在不先保存的情况下将二进制文件作为文本传递的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!