问题描述
我正在通过将大的
。我正在通过bash命令直接从 .txt
文件(> 1GB)读入 R
fread .zip
存档中读取文件:
I am reading a large .txt
file (>1GB) into R
via fread
. I am reading the file in directly from a .zip
archive, via a bash command:
base = fread('unzip -p Folder.zip File.txt', sep = '|', header = FALSE,
stringsAsFactors = FALSE, na.strings="", quote = "", col.names = col_namesMain)
文本文件通过 | ,这样典型的行可能看起来像:
The text file separates entries via |
so that a typical line might look like:
RRX|||02020||333293||||12123
但是,在很多地方,空条目由分隔符表示,它们之间没有空格,例如上面示例行中的 ||
。
However, there are many places where empty entries are denoted by separators with no space between them, e.g. ||
in the example line above.
使用 fread
时,通常会完全读取这些相邻的分隔符,因此上一行将返回以下条目:
When using fread
, these adjacent separators are typically read in altogether, so that the above line returns the following entries:
RRX, ||02020|, 333293|||, 12123
,当它读为:
RRX, NA, NA, 02020, NA, 333293, NA, NA, NA, 12123
我尝试使用 read.table
和选项 skipNul = TRUE
,这非常有效。但是,似乎没有任何类似于 fread
的 skipNul
选项。如果可能的话,我宁愿使用 read
而不是 read.table
,因为我有几个非常大的文件。尽管进行了搜索,但有关这个问题的讨论很少。非常感谢任何帮助。
I have tried using read.table
with the option skipNul = TRUE
, and this works perfectly. However, there doesn't seem to be any option similar to skipNul
for fread
. I would much prefer to use fread
over read.table
if possible, since I have several very large files. Despite my searching, I haven't come across much discussion of this problem. Any help much appreciated.
推荐答案
此问题已在dev中修复2019年4月15日的1.12.3(请参阅):
This has been fixed in dev 1.12.3 on 15 Apr 2019 (see NEWS) :
- fread()现在跳过嵌入式NUL(\0),#3400。感谢Marcus Davy提供的示例报告,以及Roy Storey的初始PR。
这篇关于在R中使用fread时如何处理分隔符之间没有空格的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!