其实我想在数据框中提取一些子词列表,我知道我们可以通过语料库提取出来,但是我不想做那不必要的事情。首先,我使用了 match grep 但问题是不能匹配除了精确匹配之外,还使用了grep,并且grep不能用于多个单词。

 a=sample(c("Client","offshor","V1fax","12mobile"),10)
 z=data.frame(a)
 z
          a
1     V1fax
2     V1fax
3  12mobile
4  12mobile
5     V1fax
6     clint
7   offshor
8     clint
9     clint
10 12mobile

d=z[is.na(match(tolower(z[,1]),c("fax","mobile","except","talwade"))),]

grep(c("fax","mobile","except","talwade"),tolower(z[,1]))
    [1] 1 2 5
Warning message:
In grep(c("fax", "mobile", "except", "talwade"  :
  argument 'pattern' has length > 1 and only the first element will be used

希望o / p为
z
       a
1     clint
2   offshor
3     clint
4     clint

如所期望的,任何有效的方法来提取子单词列表。

最佳答案

您可以使用grep做到这一点,只需要使用正则表达式OR运算符,即| ...

grep(  paste( c("fax","mobile","except","talwade") , collapse = "|" ) , tolower(z[,1]) )
# [1] 1 2 3 4 5 10


#  The pattern...
paste( c("fax","mobile","except","talwade") , collapse = "|" )
# [1] "fax|mobile|except|talwade"

10-08 00:31