问题描述
我有一个文本文件的第一行unicode字符和所有其他行在ASCII。我尝试将第一行读作一个变量,将所有其他行作为另一个读取。但是,当我使用下面的代码:
# - * - coding:utf-8 - * -
import codecs
import os
filename ='1.txt'
f = codecs.open(filename,'r3',encoding ='utf-8')
print f
)names_f = f.readline()。split('')
data_f = f.readlines()
print len(names_f)
print len(data_f)
f.close ()
print'现在完全不同:'
g = open(filename,'r')
names_g = g.readline()。split('')
print g
data_g = g.readlines()
print len(names_g)
print len(data_g)
g.close()
我得到以下输出:
< ;打开文件'1.txt',模式'rb'在0x01235230>
28
7
现在对于完全不同的东西:
28
77
如果我不不使用readlines(),整个文件读取,不仅在codecs.open()和open()中的前7行。
为什么会发生这种情况?
为什么codecs.open()以二进制模式读取文件,尽管加了'r'参数?
Upd:这是原始文件: a href =http://www1.datafilehost.com/d/0792d687 =nofollow> http://www1.datafilehost.com/d/0792d687
.readline() 第一个,所以codecs.open()
文件已经填充了线缓冲区;后续对.readlines()
的调用只返回缓冲行。
如果你再次调用
.readlines()
,剩下的行会被返回:
>>> f = codecs.open(filename,'r3',encoding ='utf-8')
>>> line = f.readline()
>>> len(f.readlines())
7
>>> len(f.readlines())
71
解决方法是不要混合
.readline()
和.readlines()
:
data_f = f.readlines()
names_f = data_f.pop (0).split('')#取第一行。
这种行为真的是一个错误; Python的开发人员都知道,请参阅。
另一个选项是使用而不是
codecs.open()
; Python 3使用io
库来实现内置的open()
函数,并且更强大并且比编解码器
模块多功能。I have a text file with first line of unicode characters and all other lines in ASCII.I try to read the first line as one variable, and all other lines as another. However, when I use the following code:
# -*- coding: utf-8 -*- import codecs import os filename = '1.txt' f = codecs.open(filename, 'r3', encoding='utf-8') print f names_f = f.readline().split(' ') data_f = f.readlines() print len(names_f) print len(data_f) f.close() print 'And now for something completely differerent:' g = open(filename, 'r') names_g = g.readline().split(' ') print g data_g = g.readlines() print len(names_g) print len(data_g) g.close()
I get the following output:
<open file '1.txt', mode 'rb' at 0x01235230> 28 7 And now for something completely differerent: <open file '1.txt', mode 'r' at 0x017875A0> 28 77
If I don't use readlines(), whole file reads, not only first 7 lines both at codecs.open() and open().
Why does such thing happen?And why does codecs.open() read file in binary mode, despite the 'r' parameter is added?
Upd: This is original file: http://www1.datafilehost.com/d/0792d687
解决方案Because you used
.readline()
first, thecodecs.open()
file has filled a linebuffer; the subsequent call to.readlines()
returns only the buffered lines.If you call
.readlines()
again, the rest of the lines are returned:>>> f = codecs.open(filename, 'r3', encoding='utf-8') >>> line = f.readline() >>> len(f.readlines()) 7 >>> len(f.readlines()) 71
The work-around is to not mix
.readline()
and.readlines()
:f = codecs.open(filename, 'r3', encoding='utf-8') data_f = f.readlines() names_f = data_f.pop(0).split(' ') # take the first line.
This behaviour is really a bug; the Python devs are aware of it, see issue 8260.
The other option is to use
io.open()
instead ofcodecs.open()
; theio
library is what Python 3 uses to implement the built-inopen()
function and is a lot more robust and versatile than thecodecs
module.这篇关于Python 2.7中的Open()和codecs.open()的行为有着奇怪的不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!