问题描述
我有一个字符串向量——myStrings
——在 R 中看起来像:
I have a vector of strings—myStrings
—in R that look something like:
[1] download file from `http://example.com`
[2] this is the link to my website `another url`
[3] go to `another url` from more info.
其中another url
是一个有效的http url,但stackoverflow 不会让我插入多个url,这就是为什么我要写another url
.我想从 myStrings
中删除所有 url 看起来像:
where another url
is a valid http url but stackoverflow will not let me insert more than one url thats why i'm writing another url
instead. I want to remove all the urls from myStrings
to look like:
[1] download file from
[2] this is the link to my website
[3] go to from more info.
我尝试了 stringr
包中的许多函数,但没有任何效果.
I've tried many functions in the stringr
package but nothing works.
推荐答案
您可以使用带有正则表达式的 gsub
来匹配 URL,
You can use gsub
with a regular expression to match URLs,
设置向量:
x <- c(
"download file from http://example.com",
"this is the link to my website http://example.com",
"go to http://example.com from more info.",
"Another url ftp://www.example.com",
"And https://www.example.net"
)
从每个字符串中删除所有 URL:
Remove all the URLs from each string:
gsub(" ?(f|ht)tp(s?)://(.*)[.][a-z]+", "", x)
# [1] "download file from" "this is the link to my website"
# [3] "go to from more info." "Another url"
# [5] "And"
更新:最好能发布几个不同的 URL,以便我们知道我们正在使用什么.但我认为这个正则表达式适用于您在评论中提到的网址:
Update: It would be best if you could post a few different URLs so we know what we're working with. But I think this regular expression will work for the URLs you mentioned in the comments:
" ?(f|ht)(tp)(s?)(://)(.*)[.|/](.*)"
上面的表达式解释:
?
可选空格(f|ht)
匹配"f"
或"ht"
tp
匹配"tp"
(s?)
可选匹配"s"
如果它在那里(://)
匹配"://"
(.*)
匹配每个字符(一切)直到[.|/]
句点或正斜杠(.*)
之后的所有内容
?
optional space(f|ht)
match"f"
or"ht"
tp
match"tp"
(s?)
optionally match"s"
if it's there(://)
match"://"
(.*)
match every character (everything) up to[.|/]
a period or a forward-slash(.*)
then everything after that
我不是正则表达式方面的专家,但我认为我的解释是正确的.
I'm not an expert with regular expressions, but I think I explained that correctly.
注意:在 SO 答案中不再允许使用 url 缩短器,因此我在进行最近的编辑时被迫删除了一个部分.查看该部分的编辑历史.
Note: url shorteners are no longer allowed in SO answers, so I was forced to remove a section while making my most recent edit. See edit history for that part.
这篇关于从字符串中删除 URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!