本文介绍了提取与模式匹配的特定单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据框带有一列:

nf1 $ Info = AC = 1; AF = 0.500; AN = 2; BaseQRankSum = -1.026e + 00; ClippingRankSum = -1.026e + 00; DP = 4; ExcessHet = 3.0103; FS = 0.000;MLEAC = 1; MLEAF = 0.500; MQ = 28.25; MQRankSum = -1.026e + 00; QD = 10.18; ReadPosRankSum = 1.03; SOR = 0.693

我正在尝试从此列中提取特定值.

I'm trying to extract a specific value from this column.

例如我对"MQRankSum"感兴趣,并且使用了:

For e.g. I'm interested in "MQRankSum" and I used:

str_extract(nf1$Info,"[MQRankSum]+=[:punct:]+[0-9]+[.]+[0-9]+")

它返回 BaseQRankSum 的值,而不是 MQRankSum .

推荐答案

将字符包含在方括号中会创建一个与任何已定义字符匹配的字符类,因此 [yes] + 会匹配 yyyyyyyyy eyyyyss

Including characters into square brackets creates a character class matching any of the defined characters, so [yes]+ matches yyyyyyyyy, eyyyyss, etc.

您想要做的是匹配一个单词 MQRankSum = ,然后匹配除; :

What you want to do is to match a word MQRankSum, =, and then any chars other than ;:

str_extract(nf1$Info,"MQRankSum=[^;]+")

如果您想从比赛中排除 MQRankSum = ,请使用后退标记:

If you want to exlcude MQRankSum= from the match, use a lookbehind:

str_extract(nf1$Info,"(?<=MQRankSum=)[^;]+")
                      ^^^^^^^^^^^^^^^

正后方的(?< = MQRankSum =)将确保当前位置的左侧紧跟着 MQRankSum = 文本,并且只有在此之后匹配1个或多个除; 以外的字符.

The (?<=MQRankSum=) positive lookbehind will make sure there is MQRankSum= text immediately to the left of the current location, and only after that will match 1 or more chars other than ;.

这篇关于提取与模式匹配的特定单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-12 11:29