问题描述
我正在尝试提取下面字符串中Humans"的Number",例如:
I'm trying to extract the "Number" of "Humans" in the string below, for example:
string <- c("ProjectObjectives|Objectives_NA, PublishDate|PublishDate_NA, DeploymentID|DeploymentID_NA, Species|Human|Gender|Female, Species|Cat|Number|1, Species|Human|Number|1, Species|Human|Position|Left")
文本在字符串中的位置会不断变化,所以我需要R来搜索字符串,找到物种|人类|数量|"并返回 1.
The position of the text in the string will constantly change, so I need R to search the string and find "Species|Human|Number|" and return 1.
如果这是另一个线程的重复,我很抱歉,但我看过这里(根据模式在 R 中提取子字符串) 和这里 (R 提取部分字符串).但我没有任何运气.
Apologies if this is a duplicate of another thread, but I've looked here (extract a substring in R according to a pattern) and here (R extract part of string). But I'm not having any luck.
有什么想法吗?
推荐答案
使用捕获方法 - 在已知子字符串之后捕获 1 个或多个数字 (\d+
)(只需转义 |
符号):
Use a capturing approach - capture 1 or more digits (\d+
) after the known substring (just escape the |
symbols):
> string <- c("ProjectObjectives|Objectives_NA, PublishDate|PublishDate_NA, DeploymentID|DeploymentID_NA, Species|Human|Gender|Female, Species|Cat|Number|1, Species|Human|Number|1, Species|Human|Position|Left")
> pattern = "Species\\|Human\\|Number\\|(\\d+)"
> unlist(regmatches(string,regexec(pattern,string)))[2]
[1] "1"
一种变体是使用带有 regmatches/regexpr
> pattern="(?<=Species\\|Human\\|Number\\|)\\d+"
> regmatches(string,regexpr(pattern,string, perl=TRUE))
[1] "1"
这里,左侧上下文被放置在一个非消费模式中,一个积极的回顾,(?<=...)
.
Here, the left side context is put inside a non-consuming pattern, a positive lookbehind, (?<=...)
.
使用 \K
运算符可以实现相同的功能:
The same functionality can be achieved with \K
operator:
> pattern="Species\\|Human\\|Number\\|\\K\\d+"
> regmatches(string,regexpr(pattern,string, perl=TRUE))
[1] "1"
这篇关于从字符串返回数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!