问题描述
我的数据框带有一列:
nf1 $ Info = AC = 1; AF = 0.500; AN = 2; BaseQRankSum = -1.026e + 00; ClippingRankSum = -1.026e + 00; DP = 4; ExcessHet = 3.0103; FS = 0.000;MLEAC = 1; MLEAF = 0.500; MQ = 28.25; MQRankSum = -1.026e + 00; QD = 10.18; ReadPosRankSum = 1.03; SOR = 0.693
我正在尝试从此列中提取特定值.
I'm trying to extract a specific value from this column.
例如我对"MQRankSum"感兴趣,并且使用了:
For e.g. I'm interested in "MQRankSum" and I used:
str_extract(nf1$Info,"[MQRankSum]+=[:punct:]+[0-9]+[.]+[0-9]+")
它返回 BaseQRankSum 的值,而不是 MQRankSum .
推荐答案
将字符包含在方括号中会创建一个与任何已定义字符匹配的字符类,因此 [yes] +
会匹配 yyyyyyyyy
, eyyyyss
等
Including characters into square brackets creates a character class matching any of the defined characters, so [yes]+
matches yyyyyyyyy
, eyyyyss
, etc.
您想要做的是匹配一个单词 MQRankSum
, =
,然后匹配除;
:
What you want to do is to match a word MQRankSum
, =
, and then any chars other than ;
:
str_extract(nf1$Info,"MQRankSum=[^;]+")
如果您想从比赛中排除 MQRankSum =
,请使用后退标记:
If you want to exlcude MQRankSum=
from the match, use a lookbehind:
str_extract(nf1$Info,"(?<=MQRankSum=)[^;]+")
^^^^^^^^^^^^^^^
正后方的(?< = MQRankSum =)
将确保当前位置的左侧紧跟着 MQRankSum =
文本,并且只有在此之后匹配1个或多个除;
以外的字符.
The (?<=MQRankSum=)
positive lookbehind will make sure there is MQRankSum=
text immediately to the left of the current location, and only after that will match 1 or more chars other than ;
.
这篇关于提取与模式匹配的特定单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!