我正在尝试从R中的一堆文本中找到主人和访客的名字。
示例文本 -
dat = data.frame(Series = c('England in Australia ODI Match',
'Prudential Trophy (Australia in England)',
'Pakistan in New Zealand ODI Match',
'Prudential Trophy (New Zealand in England)',
'Prudential Trophy (West Indies in England)',
'Australia in New Zealand ODI Series',
'Texaco Trophy (Australia in England)'))
我想要创建两个新列。所需的输出如下所示-
Visitor Host
England Australia
Australia England
Pakistan New Zealand
New Zealand England
West Indies England
Australia New Zealand
我正在尝试以下功能,但功能不完整。
dat$Host = sub(" in.*", "", dat$Series)
最佳答案
这是您想要做的事情:
re = regexpr("((New |West )?\\w+) in ((New |West )?\\w+)", dat$Series)
rm = regmatches(dat$Series, re)
d = do.call(rbind,strsplit(rm, " in "))
colnames(d) = c("Visitor","Host")
输出:
Visitor Host
[1,] "England" "Australia"
[2,] "Australia" "England"
[3,] "Pakistan" "New Zealand"
[4,] "New Zealand" "England"
[5,] "West Indies" "England"
[6,] "Australia" "New Zealand"
[7,] "Australia" "England"