本文介绍了从字符串中提取网址的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个网址:
url = "http://timesofindia.feedsportal.com/fy/8at2EuL0ihSIb3s7/story01.htmA"
末尾有一些不需要的字符,例如 A,TRE.我想删除它,所以 URL 将是这样的:
There are some unwanted characters like A,TRE, at the end. I want to remove this so the URL will be like this:
url = http://timesofindia.feedsportal.com/fy/8at2EuL0ihSIb3s7/story01.htm
如何删除它们?
推荐答案
如果你的 url 总是以 .htm
、.apsx
或 .php
结尾代码>你可以用一个简单的正则表达式来解决它:
If your url always finish with .htm
, .apsx
or .php
you can solve it with a simple regex:
url = url[/^(.+\.(htm|aspx|php))(:?.*)$/, 1]
测试在Rubular这里.
首先我使用 此方法 获取子字符串,类似于切片.然后是正则表达式.从左到右:
First I use this method to get a substring, works like slice. Then comes the regex. From left to right:
^ # Start of line
( # Capture everything wanted enclosed
.+ # 1 or more of any character
\. # With a dot after it
(htm|aspx|php) # htm or aspx or php
) # Close url asked in question
( # Capture undesirable part
:? # Optional
.* # 0 or more any character
) # Close undesirable part
$ # End of line
这篇关于从字符串中提取网址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!