问题描述
我有一个数据框架
d< -data.frame(name = c(brown cat,blue cat 大狮子,高虎虎,
pre>
黑豹,短猫,红鸟,
短鸟,大鹰,麻雀 ,
dog fish,head dog,brown yorkie,
lab short bulldog),label = 1:14)
我想搜索
名称
列,如果单词
cat,lion,tiger,panther
出现,我想将字符串feline
列和相应的行种类
。
如果单词
bird,eagle ,sparrow
出现,我想将字符串avian
分配给一个新的列和相应的行code>。
如果单词
dog,yorkie,bulldog
我想分配字符串canine
到一个新的列和相应的行种类
。
理想情况下,我将它存储在列表或某事类似于我可以保持在脚本的开头,因为当种类的新变种出现在名称类别中时,很容易获得更新的资格,作为一个
猫科动物c $ c>,
avian
和canine
。
这个问题几乎在这里回答(),但不解决此问题中存在的多重名称扭曲
解决方案可能有一个比这更优雅的解决方案,但您可以使用
grep
与指定替代匹配。
d [grep(cat | lion | tiger | panther ,d $ name),species]< - feline
d [grep(bird | eagle | sparrow,d $ name),species]< - avian
d [grep(dog | yorkie,d $ name),species)< - canine
你可能想添加
ignore.case = TRUE
到grep。
输出:
#名称标签种类
#1棕色猫1猫猫
#2蓝猫2猫猫
#3大狮子3猫猫
#4高虎4猫猫
#5黑豹5猫猫
#6短猫6猫猫
#7红鸟7禽鸟
#8短鸟塞8禽鸟
#9大鹰9禽鸟
#10坏麻雀10禽鸟
#11狗鱼11犬
#12头狗12犬
#13 brown yorkie 13 cani ne
#14实验室短牛头犬14 canine
I have a dataframe
d<-data.frame(name=c("brown cat", "blue cat", "big lion", "tall tiger", "black panther", "short cat", "red bird", "short bird stuffed", "big eagle", "bad sparrow", "dog fish", "head dog", "brown yorkie", "lab short bulldog"), label=1:14)
I'd like to search the
name
column and if the words"cat","lion","tiger","panther"
appear, I want to assign the character stringfeline
to a new column and corresponding rowspecies
.if the words
"bird","eagle","sparrow"
appear, I want to assign the character stringavian
to a new column and corresponding rowspecies
.if the words
"dog","yorkie","bulldog"
appear, I want to assign the character stringcanine
to a new column and corresponding rowspecies
.Ideally, I'd store this in a list or something similar that I can keep at the beginning of the script, because as new variants of the species show up in the name category, it would be nice to have easy access to update what qualifies as a
feline
,avian
andcanine
.This question is almost answered here (How to create new column in dataframe based on partial string matching other column in R), but it doesn't address the multiple name twist that is present in this problem
解决方案There may be a more elegant solution than this, but you could use
grep
with|
to specify alternative matches.d[grep("cat|lion|tiger|panther", d$name), "species"] <- "feline" d[grep("bird|eagle|sparrow", d$name), "species"] <- "avian" d[grep("dog|yorkie", d$name), "species"] <- "canine"
I've assumed you meant "avian", and left out "bulldog" since it contains "dog".
You might want to add
ignore.case = TRUE
to the grep.output:
# name label species #1 brown cat 1 feline #2 blue cat 2 feline #3 big lion 3 feline #4 tall tiger 4 feline #5 black panther 5 feline #6 short cat 6 feline #7 red bird 7 avian #8 short bird stuffed 8 avian #9 big eagle 9 avian #10 bad sparrow 10 avian #11 dog fish 11 canine #12 head dog 12 canine #13 brown yorkie 13 canine #14 lab short bulldog 14 canine
这篇关于部分字符串匹配r的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!