问题描述
我正在尝试读取R中的一个二进制文件,其中包含一个360x180值的简单2D数组.作为参考,可以在这里找到二进制文件:
I am trying to read a binary file in R containing a simple 2D array of 360x180 values. For reference, the binary file can be found here:
http://transcom.project.asu.edu/download/transcom03/smoothmap.fix.2.bin
这是该.bin的自述文件所说的:
Here is what the readme for this .bin says:
我的代码:
to.read <- file("smoothmap.fix.2.bin", "rb")
raw.transcom <- readBin(to.read, integer(), n = 360*180, size = 4, endian = "big")
transcom <- matrix(raw.transcom, 180, 360, byrow = F)
现在raw.transcom仅包含垃圾值:
Now raw.transcom contains only junk values:
unique(raw.transcom)
[1] 259200 0 1101004800 1082130432 1092616192 1097859072 1100480512 1102053376 1086324736
[10] 1077936128 1101529088 1095761920 1096810496 1099956224 1091567616 1084227584 1090519040 1094713344
[19] 1099431936 1073741824 1093664768 1088421888 1065353216 1098907648
为什么会这样?
我已经看了一个小时了,我很困惑.尝试使用字节序设置和readBin中的大小",但这无济于事.
I've been looking at this for an hour now and I'm stumped. Played around with endian-ness settings and the 'size' in readBin, but that did not help.
如何正确读取此文件?
How can I read in this file correctly?
推荐答案
好吧,我没有时间戳"R"方法来执行此操作,但是我确实可以访问GDL并找到了此,所以我把它放在一起:
Well, I didn't have time to poke at the "R" way to do this, but I do have access to GDL and found this, so I threw together:
Data = read_binary('smoothmap.fix.2.bin',DATA_TYPE=4,ENDIAN='big');
Data = Data[1:64800]
Data = reform(Data,[360,180])
openw,unit,'testfile.dat',/get_lun
printf,unit,Data
free_lun,unit
并设法生成: http://rud.is/dl/testfile.dat. gz
如果您抓住它并这样做:
If you grab that and do:
x <- as.numeric(scan("testfile.dat.gz", "numeric"))
length(x)
## [1] 64800
table(x)
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
## 7951 1643 1189 796 868 1688 864 2345 2487 509 733 1410 5144 2388 2433 4111 7617 2450 1671 2058 9161 2334 2950
肯定可以为您指定的定义获取正确的值,您可以将其转换为矩阵.
It definitely looks like it's got the right values for the definition you specified and you can turn that into a matrix.
不过,请回来检查,因为我现在需要来弄清楚如何在R中执行此操作:-)
Check back, though, as I now need to figure out how to do this in R :-)
更新
知道了!
我很高兴能找到IDL代码来验证R结果.
I'm rly kinda glad I found the IDL code to verify the R results.
x <- readBin("smoothmap.fix.2.bin", "raw", file.size("smoothmap.fix.2.bin"))
x <- x[-(1:4)]
x <- x[-((length(x)-3):length(x))]
table(readBin(rawConnection(x), "numeric", 360*180, 4, endian="big"))
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
## 7951 1643 1189 796 868 1688 864 2345 2487 509 733 1410 5144 2388 2433 4111 7617 2450 1671 2058 9161 2334 2950
理想情况下,我们会检查前4个字节和后4个字节是否相等,但是此技巧可以使您顺利通过.
Ideally, we'd check for first 4 and last 4 bytes being equal, but this hack shld get you through.
将它们放在一起
添加了代码的验证位...
Added validation bits of code…
#' Read in a binary array, likely written with IDL
#'
#' @param x path to file (auto-expanded & tested for existence)
#' @param n number of `float` elements to read in
#' @param endian endian-ness (default `big`)
#' @return numeric vector of length `n`
read_binary_float <- function(x, n, endian="big") {
x <- normalizePath(path.expand(x))
x <- readBin(con = x, what = "raw", n = file.size(x))
first4 <- x[1:4] # extract front bits
last4 <- x[(length(x)-3):length(x)] # extract back bits
# convert both to long ints
f4c <- rawConnection(first4)
on.exit(close(f4c), add=TRUE)
f4 <- readBin(con = f4c, what = "integer", n = 1, size = 4L, endian=endian)
l4c <- rawConnection(last4)
on.exit(close(l4c), add=TRUE)
l4 <- readBin(con = l4c, what = "integer", n = 1, size = 4L, endian=endian)
# validation
stopifnot(f4 == l4) # check front/back are equal
stopifnot(f4 == n*4) # check if `n` matches expected record count
# strip off front and back bits
x <- x[-(1:4)]
x <- x[-((length(x)-3):length(x))]
# slurp it all in
rc <- rawConnection(x)
on.exit(close(rc), add=TRUE)
readBin(con = rc, what = "numeric", n = n, size = 4L, endian=endian)
}
简单示例:
library(magrittr)
read_binary_float("smoothmap.fix.2.bin", 360*180) %>%
matrix(nrow = 360, ncol = 180) %>%
image()
此文件似乎符合Fortran未格式化的I/O"规范: https://docs.oracle.com/cd/E19957-01/805-4939/6j4m0vnc4/index.html :确认了
This file seems to conform to the Fortran "unformatted I/O" spec : https://docs.oracle.com/cd/E19957-01/805-4939/6j4m0vnc4/index.html : which confirmed the
"# records" | record | record | … | record | "# records"
我们看到了.因此,该函数可以推广为不仅支持float
转换:
we saw. So the function could be generalized to support more than just float
conversion:
read_binary_array <- function(x, type=c("byte", "integer", "float"), endian="big") {
type <- match.arg(trimws(tolower(type)), c("byte", "integer", "float"))
type_size <- unname(c("byte"=1, "integer"=4, "float"=4)[type])
x <- normalizePath(path.expand(x))
x <- readBin(con = x, what = "raw", n = file.size(x))
first4 <- x[1:4]
last4 <- x[(length(x)-3):length(x)]
f4c <- rawConnection(first4)
on.exit(close(f4c), add=TRUE)
f4 <- readBin(con = f4c, what = "integer", n = 1, size = 4L, endian=endian)
l4c <- rawConnection(last4)
on.exit(close(l4c), add=TRUE)
l4 <- readBin(con = l4c, what = "integer", n = 1, size = 4L, endian=endian)
stopifnot(f4 == l4) # check front/back are equal
stopifnot((f4 %% type_size == 0)) # shld have nothing left over
n_rec <- f4 / type_size
message(sprintf("Reading in %s records...", scales::comma(n_rec)))
x <- x[-(1:4)]
x <- x[-((length(x)-3):length(x))]
rc <- rawConnection(x)
on.exit(close(rc), add=TRUE)
what <- switch(type, byte="raw", integer="integer", float="numeric")
dat <- readBin(con = rc, what = what, n = n_rec, size = type_size, endian=endian)
dat
}
这篇关于在R中读取二进制映射文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!