本文介绍了用于查找 <a> 的“href"值的正则表达式关联的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一个正则表达式模式来查找 HTML 中的网页链接.

I need a regex pattern for finding web page links in HTML.

我首先使用@"(来提取链接(),但我无法从中获取 href.

I first use @"(<a.*?>.*?</a>)" to extract links (<a>), but I can't fetch href from that.

我的字符串是:

  1. <a href="www.example.com/page.php?id=xxxx&name=yyyy" ....></a>
  2. <a href="http://www.example.com/page.php?id=xxxx&name=yyyy" ....></a>
  3. <a href="https://www.example.com/page.php?id=xxxx&name=yyyy" ....></a>
  4. <a href="www.example.com/page.php/404" ....></a>

1、2 和 3 是有效的,我需要它们,但数字 4 对我无效(?= 是必不可少的)

1, 2 and 3 are valid and I need them, but number 4 is not valid for me(? and = is essential)

谢谢大家,但我不需要解析.我有一个 href="abcdef" 格式的链接列表.

Thanks everyone, but I don't need parsing <a>. I have a list of links in href="abcdef" format.

我需要获取链接的 href 并对其进行过滤,我最喜欢的 url 必须包含 ?= 就像 page.php?id=5

I need to fetch href of the links and filter it, my favorite urls must be contain ? and = like page.php?id=5

谢谢!

推荐答案

我建议在正则表达式上使用 HTML 解析器,但这里仍然是一个正则表达式,它将在 href 每个链接的属性.无论使用双引号还是单引号,它都会匹配.

I'd recommend using an HTML parser over a regex, but still here's a regex that will create a capturing group over the value of the href attribute of each links. It will match whether double or single quotes are used.

<as+(?:[^>]*?s+)?href=(["'])(.*?)1

您可以在此处查看此正则表达式的完整说明.

You can view a full explanation of this regex at here.

片段游乐场:

const linkRx = /<as+(?:[^>]*?s+)?href=(["'])(.*?)1/;
const textToMatchInput = document.querySelector('[name=textToMatch]');

document.querySelector('button').addEventListener('click', () => {
  console.log(textToMatchInput.value.match(linkRx));
});
<label>
  Text to match:
  <input type="text" name="textToMatch" value='<a href="google.com"'>

  <button>Match</button>
 </label>

这篇关于用于查找 <a> 的“href"值的正则表达式关联的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-12 12:41