本文介绍了Python 2.7中的Open()和codecs.open()的行为有着奇怪的不同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文本文件的第一行unicode字符和所有其他行在ASCII。
我尝试将第一行读作一个变量,将所有其他行作为另一个读取。但是,当我使用下面的代码:

 # -  *  -  coding:utf-8  -  *  -  
import codecs
import os
filename ='1.txt'
f = codecs.open(filename,'r3',encoding ='utf-8')
print f
)names_f = f.readline()。split('')
data_f = f.readlines()
print len(names_f)
print len(data_f)
f.close ()
print'现在完全不同:'
g = open(filename,'r')
names_g = g.readline()。split('')
print g
data_g = g.readlines()
print len(names_g)
print len(data_g)
g.close()


我得到以下输出:

 < ;打开文件'1.txt',模式'rb'在0x01235230> 
28

7

现在对于完全不同的东西:


28

77

如果我不不使用readlines(),整个文件读取,不仅在codecs.open()和open()中的前7行。

为什么会发生这种情况?
为什么codecs.open()以二进制模式读取文件,尽管加了'r'参数?

Upd:这是原始文件: a href =http://www1.datafilehost.com/d/0792d687 =nofollow> http://www1.datafilehost.com/d/0792d687


.readline() 第一个,所以 codecs.open()文件已经填充了线缓冲区;后续对 .readlines()的调用只返回缓冲行。



如果你再次调用 .readlines() ,剩下的行会被返回:

 >>> f = codecs.open(filename,'r3',encoding ='utf-8')
>>> line = f.readline()
>>> len(f.readlines())
7
>>> len(f.readlines())
71

解决方法是不要混合 .readline() .readlines()

 data_f = f.readlines()
names_f = data_f.pop (0).split('')#取第一行。

这种行为真的是一个错误; Python的开发人员都知道,请参阅。

另一个选项是使用而不是 codecs.open(); Python 3使用 io 库来实现内置的 open()函数,并且更强大并且比编解码器模块多功能。


I have a text file with first line of unicode characters and all other lines in ASCII.I try to read the first line as one variable, and all other lines as another. However, when I use the following code:

# -*- coding: utf-8 -*-
import codecs
import os
filename = '1.txt'
f = codecs.open(filename, 'r3', encoding='utf-8')
print f
names_f = f.readline().split(' ')
data_f = f.readlines()
print len(names_f)
print len(data_f)
f.close()
print 'And now for something completely differerent:'
g = open(filename, 'r')
names_g = g.readline().split(' ')
print g
data_g = g.readlines()
print len(names_g)
print len(data_g)
g.close()

I get the following output:

<open file '1.txt', mode 'rb' at 0x01235230>
28

7

And now for something completely differerent:

<open file '1.txt', mode 'r' at 0x017875A0>

28

77

If I don't use readlines(), whole file reads, not only first 7 lines both at codecs.open() and open().

Why does such thing happen?And why does codecs.open() read file in binary mode, despite the 'r' parameter is added?

Upd: This is original file: http://www1.datafilehost.com/d/0792d687

解决方案

Because you used .readline() first, the codecs.open() file has filled a linebuffer; the subsequent call to .readlines() returns only the buffered lines.

If you call .readlines() again, the rest of the lines are returned:

>>> f = codecs.open(filename, 'r3', encoding='utf-8')
>>> line = f.readline()
>>> len(f.readlines())
7
>>> len(f.readlines())
71

The work-around is to not mix .readline() and .readlines():

f = codecs.open(filename, 'r3', encoding='utf-8')
data_f = f.readlines()
names_f = data_f.pop(0).split(' ')  # take the first line.

This behaviour is really a bug; the Python devs are aware of it, see issue 8260.

The other option is to use io.open() instead of codecs.open(); the io library is what Python 3 uses to implement the built-in open() function and is a lot more robust and versatile than the codecs module.

这篇关于Python 2.7中的Open()和codecs.open()的行为有着奇怪的不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-30 23:29