R中包含和排除某些字符串的正则表达式

R中包含和排除某些字符串的正则表达式

本文介绍了R中包含和排除某些字符串的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 R 来解析多个条目.我对想要返回的条目有两个要求.我想要包含单词 apple 但不包含单词 orange 的所有条目.

I am trying to use R to parse through a number of entries. I have two requirements for the the entries I want back. I want all the entries that contain the word apple but don't contain the word orange.

例如:

  1. 我喜欢苹果
  2. 我真的很喜欢苹果
  3. 我喜欢苹果和橙子

我想取回条目 1 和 2.

I want to get entries 1 and 2 back.

我该如何使用 R 来做到这一点?

How could I go about using R to do this?

谢谢.

推荐答案

使用正则表达式,您可以执行以下操作.

Using a regular expression, you could do the following.

x <- c('I like apples', 'I really like apples',
       'I like apples and oranges', 'I like oranges and apples',
       'I really like oranges and apples but oranges more')

x[grepl('^((?!.*orange).)*apple.*$', x, perl=TRUE)]
# [1] "I like apples"        "I really like apples"

正则表达式向前看是否除了换行符和子字符串 orange 之外没有任何字符,如果有,那么点 . 将匹配除 a 之外的任何字符换行,因为它被包裹在一个组中,并重复(0 次或多次).接下来,我们查找 apple 和除换行符以外的任何字符(0 或多次).最后,线的起点和终点锚点就位以确保输入被消耗.

The regular expression looks ahead to see if there's no character except a line break and no substring orange and if so, then the dot . will match any character except a line break as it is wrapped in a group, and repeated (0 or more times). Next we look for apple and any character except a line break (0 or more times). Finally, the start and end of line anchors are in place to make sure the input is consumed.

更新:如果性能有问题,您可以使用以下方法.

UPDATE: You could use the following if performance is an issue.

x[grepl('^(?!.*orange).*$', x, perl=TRUE)]

这篇关于R中包含和排除某些字符串的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-14 22:16