本文介绍了Sys.glob()解压缩()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

限时删除!!

TLDNR:如何在zip()中使用Sys.glob()?

我有多个.zip文件,我只想从每个档案中提取一个文件.

I have multiple .zip files and I want to extract only one file from each archive.

例如,其中一个档案包含以下文件:

For example, one of the archives contains the following files:

[1] "cmc-20150531.xml"     "cmc-20150531.xsd"     "cmc-20150531_cal.xml" "cmc-20150531_def.xml" "cmc-20150531_lab.xml"
[6] "cmc-20150531_pre.xml"

我要提取第一个文件,因为它与某个模式匹配.为此,我使用以下命令:

I want to extract the first file because it matches a pattern. In order to do that I use the following command:

unzip("zip-archive.zip", files=Sys.glob("[a-z][a-z][a-z][-][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][.][x][m][l]"))

但是,该命令不起作用,我也不知道为什么. R仅提取存档中的所有文件.

However, the command doesn't work, and I don't know why. R just extracts all files in the archive.

另一方面,以下命令有效:

On the other hand, the following command works:

unzip("zip-archive.zip", files="cmc-20150531.xml")

如何在unzip()中使用Sys.glob()?

How do I use Sys.glob() within unzip()?

推荐答案

Sys.glob扩展已经存在的文件.因此,unzip调用的参数将取决于工作目录中的文件.

Sys.glob expands files that already exist. So the parameter to your unzip call will depend on what files are in your working directory.

也许您想对list=TRUE进行unzip操作,以便首先返回zip中的文件列表,然后使用某种模式匹配来选择所需的文件.

Perhaps you want to do unzip with list=TRUE to return the list of files in the zip first, and then use some pattern matching to select the files you want.

有关将字符串与模式匹配的信息,请参见?grep.这些模式是正则表达式"而不是"glob"扩展,但是您应该可以使用它.

See ?grep for info on matching strings with patterns. These patterns are "regular expressions" rather than "glob" expansions, but you should be able to work with that.

这是一个具体的例子:

# whats in the zip?
files = unzip("c.zip", list=TRUE)$Name
files
[1] "l_spatial.dbf"    "l_spatial.shp"    "l_spatial.shx"    "ls_polys_bin.dbf"
[5] "ls_polys_bin.shp" "ls_polys_bin.shx" "rast_jan90.tif"

# what files have "dbf" in them:
files[grepl("dbf",files)]
[1] "l_spatial.dbf"    "ls_polys_bin.dbf"

# extract just those:
unzip("c.zip", files=files[grepl("dbf",files)])

您的glob的正则表达式

The regular expression for your glob

 "[a-z][a-z][a-z][-][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][.][x][m][l]"

将会

 "^[a-z]{3}-[0-9]{8}\\.xml$"

这是字符串开头("^"),3 az(仅小写),破折号,八位数字,一个点(需要反斜杠)的匹配项,一个是因为点在正则表达式中表示任何一个字符",而另一个则是因为R需要反斜杠才能转义反斜杠),"xml"和字符串的结尾("$").

that's a match of start of string ("^"), 3 a-z (lower case only), a dash, eight digits, a dot (backslashes are needed, one because dot means "any one char" in regexps and another because R needs a backslash to escape a backslash), "xml", and the end of the string ("$").

这篇关于Sys.glob()解压缩()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

1403页,肝出来的..

09-06 13:34