问题描述
TLDNR:如何在zip()中使用Sys.glob()?
我有多个.zip文件,我只想从每个档案中提取一个文件.
I have multiple .zip files and I want to extract only one file from each archive.
例如,其中一个档案包含以下文件:
For example, one of the archives contains the following files:
[1] "cmc-20150531.xml" "cmc-20150531.xsd" "cmc-20150531_cal.xml" "cmc-20150531_def.xml" "cmc-20150531_lab.xml"
[6] "cmc-20150531_pre.xml"
我要提取第一个文件,因为它与某个模式匹配.为此,我使用以下命令:
I want to extract the first file because it matches a pattern. In order to do that I use the following command:
unzip("zip-archive.zip", files=Sys.glob("[a-z][a-z][a-z][-][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][.][x][m][l]"))
但是,该命令不起作用,我也不知道为什么. R仅提取存档中的所有文件.
However, the command doesn't work, and I don't know why. R just extracts all files in the archive.
另一方面,以下命令有效:
On the other hand, the following command works:
unzip("zip-archive.zip", files="cmc-20150531.xml")
如何在unzip()中使用Sys.glob()?
How do I use Sys.glob() within unzip()?
推荐答案
Sys.glob
扩展已经存在的文件.因此,unzip
调用的参数将取决于工作目录中的文件.
Sys.glob
expands files that already exist. So the parameter to your unzip
call will depend on what files are in your working directory.
也许您想对list=TRUE
进行unzip
操作,以便首先返回zip中的文件列表,然后使用某种模式匹配来选择所需的文件.
Perhaps you want to do unzip
with list=TRUE
to return the list of files in the zip first, and then use some pattern matching to select the files you want.
有关将字符串与模式匹配的信息,请参见?grep
.这些模式是正则表达式"而不是"glob"扩展,但是您应该可以使用它.
See ?grep
for info on matching strings with patterns. These patterns are "regular expressions" rather than "glob" expansions, but you should be able to work with that.
这是一个具体的例子:
# whats in the zip?
files = unzip("c.zip", list=TRUE)$Name
files
[1] "l_spatial.dbf" "l_spatial.shp" "l_spatial.shx" "ls_polys_bin.dbf"
[5] "ls_polys_bin.shp" "ls_polys_bin.shx" "rast_jan90.tif"
# what files have "dbf" in them:
files[grepl("dbf",files)]
[1] "l_spatial.dbf" "ls_polys_bin.dbf"
# extract just those:
unzip("c.zip", files=files[grepl("dbf",files)])
您的glob的正则表达式
The regular expression for your glob
"[a-z][a-z][a-z][-][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][.][x][m][l]"
将会
"^[a-z]{3}-[0-9]{8}\\.xml$"
这是字符串开头("^"),3 az(仅小写),破折号,八位数字,一个点(需要反斜杠)的匹配项,一个是因为点在正则表达式中表示任何一个字符",而另一个则是因为R需要反斜杠才能转义反斜杠),"xml"和字符串的结尾("$").
that's a match of start of string ("^"), 3 a-z (lower case only), a dash, eight digits, a dot (backslashes are needed, one because dot means "any one char" in regexps and another because R needs a backslash to escape a backslash), "xml", and the end of the string ("$").
这篇关于Sys.glob()解压缩()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!