在一个文本文件中给了我数据,如下所示:
Measurement: mc
Loop:
var1=0, var2=-5, var3=1.8
values:
iteration data
0 1.203
1 1.206
2 2.206
3 1.201
4 1.204
5 1.204
6 1.204
statistics:
max 1.206
min 1.201
mean 1.204
stddev 0.001
avgdev 0.001
failedtimes 0
Measurement: mc
Loop:
var1=10, var2=-5, var3=1.8
values:
iteration data
0 1.203
1 1.206
2 2.206
3 1.201
statistics:
max 1.206
min 1.201
mean 1.204
stddev 0.001
avgdev 0.001
failedtimes 0
我希望以更普通的格式获取数据,例如:
var1, var2, var3, iteration, data,
0, -5, 1.8, 0, 1.203,
0, -5, 1.8, 1, 1.206,
...
10, -5, 1.8, 0, 1.203,
我在尝试解析这样的数据时遇到问题。请帮助
最佳答案
一种方法是使用少量的简单正则表达式和readLines
提取相关行。
您的资料
txt <-
"Measurement: mc
Loop:
var1=0, var2=-5, var3=1.8
values:
iteration data
0 1.203
1 1.206
2 2.206
3 1.201
4 1.204
5 1.204
6 1.204
statistics:
max 1.206
min 1.201
mean 1.204
stddev 0.001
avgdev 0.001
failedtimes 0
Measurement: mc
Loop:
var1=10, var2=-5, var3=1.8
values:
iteration data
0 1.203
1 1.206
2 2.206
3 1.201
statistics:
max 1.206
min 1.201
mean 1.204
stddev 0.001
avgdev 0.001"
# Read in : you can pass the file path instead of textConnection
r = readLines(textConnection(txt))
# Find indices of relevant parts of string that you want to keep
id1 = grep("var", r)
id2 = grep("iteration", r)
id3 = grep("statistics", r)
# indices for iteration data
m = mapply( seq, id2, id3-1)
# Use read.table to parse the relevant rows
lst <- lapply(seq_along(m), function(x)
cbind(read.table(text=r[id1][x], sep=","), #var data
read.table(text=r[m[[x]]], header=TRUE))) # iteration data
dat <- do.call(rbind, lst)
# Remove the var= text and convert to numeric
dat[] <- lapply(dat, function(x) as.numeric(gsub("var\\d+=", "", x)))
dat
# V1 V2 V3 iteration data
# 1 0 -5 1.8 0 1.203
# 2 0 -5 1.8 1 1.206
# 3 0 -5 1.8 2 2.206
# 4 0 -5 1.8 3 1.201
# 5 0 -5 1.8 4 1.204
# 6 0 -5 1.8 5 1.204
# 7 0 -5 1.8 6 1.204
# 8 10 -5 1.8 0 1.203
# 9 10 -5 1.8 1 1.206
# 10 10 -5 1.8 2 2.206
# 11 10 -5 1.8 3 1.201
实际上可能更清晰一些,可以将数据拆分为多个部分,然后应用一个函数,即
sp <- split(r, cumsum(grepl("measure", r, TRUE)))
# Function to parse
fun <- function(x){
id1 = grep("var", x)
id2 = grep("iteration", x)
id3 = grep("statistics", x)
m = seq(id2, id3-1)
cbind(read.table(text=x[id1], sep=","),
read.table(text=x[m], header=TRUE))
}
lst <- lapply(sp, fun)
然后像以前一样继续
关于r - 在R中导入结构不良的数据,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/41904982/