问题描述
我需要一个正则表达式模式来查找 HTML 中的网页链接.
I need a regex pattern for finding web page links in HTML.
我首先使用@"(
),但我无法从中获取
href
.
I first use @"(<a.*?>.*?</a>)"
to extract links (<a>
), but I can't fetch href
from that.
我的字符串是:
<a href="www.example.com/page.php?id=xxxx&name=yyyy" ....></a>
<a href="http://www.example.com/page.php?id=xxxx&name=yyyy" ....></a>
<a href="https://www.example.com/page.php?id=xxxx&name=yyyy" ....></a>
<a href="www.example.com/page.php/404" ....></a>
1、2 和 3 是有效的,我需要它们,但数字 4 对我无效(?
和 =
是必不可少的)
1, 2 and 3 are valid and I need them, but number 4 is not valid for me(?
and =
is essential)
谢谢大家,但我不需要解析.我有一个
href="abcdef"
格式的链接列表.
Thanks everyone, but I don't need parsing <a>
. I have a list of links in href="abcdef"
format.
我需要获取链接的 href
并对其进行过滤,我最喜欢的 url 必须包含 ?
和 =
就像 page.php?id=5
I need to fetch href
of the links and filter it, my favorite urls must be contain ?
and =
like page.php?id=5
谢谢!
推荐答案
我建议在正则表达式上使用 HTML 解析器,但这里仍然是一个正则表达式,它将在 href 每个链接的属性.无论使用双引号还是单引号,它都会匹配.
I'd recommend using an HTML parser over a regex, but still here's a regex that will create a capturing group over the value of the href
attribute of each links. It will match whether double or single quotes are used.
<as+(?:[^>]*?s+)?href=(["'])(.*?)1
您可以在此处查看此正则表达式的完整说明.
You can view a full explanation of this regex at here.
片段游乐场:
const linkRx = /<as+(?:[^>]*?s+)?href=(["'])(.*?)1/;
const textToMatchInput = document.querySelector('[name=textToMatch]');
document.querySelector('button').addEventListener('click', () => {
console.log(textToMatchInput.value.match(linkRx));
});
<label>
Text to match:
<input type="text" name="textToMatch" value='<a href="google.com"'>
<button>Match</button>
</label>
这篇关于用于查找 <a> 的“href"值的正则表达式关联的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!