本文介绍了使用 pandas read_csv读取此制表符分隔的文件时,行丢失的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个.text
文件,格式如下,其中字段(索引号,名称和消息)由\t
分隔(制表符分隔):
I have a .text
file with following format, where fields (index number, name and message) are separated by \t
(tab-separated):
712 ben Battle of the Books
713 james i used to be in TOM
714 tomy i was in BOB once
715 ben Tournaments of Minds
716 tommy Also the Lion in the upcoming school play
717 tommy Can you guess
718 tommy P
...
我用read_csv
读取的
进入了数据框:
which I read with read_csv
into a data frame:
chat = pd.read_csv("f.text", sep = "\t", header = None, usecols = [2])
但是数据帧仅具有9812
行,而普通文件具有超过12428
行(仅21空行).这很奇怪.你有什么主意吗?谢谢.
But the data frame just has 9812
rows while the ordinary file has more than 12428
rows (just 21 empty lines). It is quite weird. Do you have any idea? Thanks.
推荐答案
我认为您需要添加参数quoting
:
I think you need add parameter quoting
:
import csv
chat = pd.read_csv("f.text",sep = "\t", header = None, usecols = [2], quoting=csv.QUOTE_NONE)
这篇关于使用 pandas read_csv读取此制表符分隔的文件时,行丢失的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!