我想读取一个CSV文件,该文件的第一行是变量名,而后几行是那些变量的内容。有些变量是数字变量,有些是文本变量,有些甚至是空变量。
file = "path/file.csv"
f = file(file,'r')
varnames = strsplit(readLines(f,1),",")[[1]]
data = strsplit(readLines(f,1),",")[[1]]
现在,数据包含所有变量,我该如何做,以便数据可以识别正在读取的数据类型,就像执行
read.csv
一样。我需要逐行读取数据(或一次读取n行),因为整个数据集太大而无法读取到R中。
最佳答案
根据DWin的评论,您可以尝试如下操作:
read.clump <- function(file, lines, clump){
if(clump > 1){
header <- read.csv(file, nrows=1, header=FALSE)
p = read.csv(file, skip = lines*(clump-1),
#p = read.csv(file, skip = (lines*(clump-1))+1 if not a textConnection
nrows = lines, header=FALSE)
names(p) = header
} else {
p = read.csv(file, skip = lines*(clump-1), nrows = lines)
}
return(p)
}
您可能还应该在函数中添加一些错误处理/检查。
然后用
x = "letter1, letter2
a, b
c, d
e, f
g, h
i, j
k, l"
>read.clump(textConnection(x), lines = 2, clump = 1)
letter1 letter2
1 a b
2 c d
> read.clump(textConnection(x), lines = 2, clump = 2)
letter1 letter2
1 e f
2 g h
> read.clump(textConnection(x), lines = 3, clump = 1)
letter1 letter2
1 a b
2 c d
3 e f
> read.clump(textConnection(x), lines = 3, clump = 2)
letter1 letter2
1 g h
2 i j
3 k l
现在,您只需要*适用于团块