问题描述
我在一个名为'input_content'的javascript变量中有一段文本,该文本包含多个锚标记/链接。我想匹配所有锚标签并提取锚文本和URL,并将其放入类似(或类似)的数组中:
I have a paragraph of text in a javascript variable called 'input_content' and that text contains multiple anchor tags/links. I would like to match all of the anchor tags and extract anchor text and URL, and put it into an array like (or similar to) this:
Array
(
[0] => Array
(
[0] => <a href="http://yahoo.com">Yahoo</a>
[1] => http://yahoo.com
[2] => Yahoo
)
[1] => Array
(
[0] => <a href="http://google.com">Google</a>
[1] => http://google.com
[2] => Google
)
)
我对它采取了一个裂缝( ),但是我超越了这一点。感谢您的帮助!
I've taken a crack at it (http://pastie.org/339755), but I am stumped beyond this point. Thanks for the help!
推荐答案
var matches = [];
input_content.replace(/[^<]*(<a href="([^"]+)">([^<]+)<\/a>)/g, function () {
matches.push(Array.prototype.slice.call(arguments, 1, 4))
});
这假设您的锚点始终采用< a href =...>形式...... < / a>
即如果有任何其他属性(例如, target
)它将无效。正则表达式可以改进为了适应这种情况。
This assumes that your anchors will always be in the form <a href="...">...</a>
i.e. it won't work if there are any other attributes (for example, target
). The regular expression can be improved to accommodate this.
要分解正则表达式:
/ -> start regular expression
[^<]* -> skip all characters until the first <
( -> start capturing first token
<a href=" -> capture first bit of anchor
( -> start capturing second token
[^"]+ -> capture all characters until a "
) -> end capturing second token
"> -> capture more of the anchor
( -> start capturing third token
[^<]+ -> capture all characters until a <
) -> end capturing third token
<\/a> -> capture last bit of anchor
) -> end capturing first token
/g -> end regular expression, add global flag to match all anchors in string
每次调用我们的匿名函数都会收到三个标记作为第二个,第三个和第四个标记参数,即参数[1],参数[2],参数[3]:
Each call to our anonymous function will receive three tokens as the second, third and fourth arguments, namely arguments[1], arguments[2], arguments[3]:
- arguments [1]是整个锚点
- 参数[2]是href部分
- 参数[3]是里面的文字
我们将使用hack将这三个参数作为新数组推送到我们的主匹配
数组中。 参数
内置变量不是真正的JavaScript数组,所以我们必须应用 split
数组方法在它上提取我们想要的项目:
We'll use a hack to push these three arguments as a new array into our main matches
array. The arguments
built-in variable is not a true JavaScript Array, so we'll have to apply the split
Array method on it to extract the items we want:
Array.prototype.slice.call(arguments, 1, 4)
这将从索引1开始从参数
中提取项目在索引4结束(不包括)。
This will extract items from arguments
starting at index 1 and ending (not inclusive) at index 4.
var input_content = "blah \
<a href=\"http://yahoo.com\">Yahoo</a> \
blah \
<a href=\"http://google.com\">Google</a> \
blah";
var matches = [];
input_content.replace(/[^<]*(<a href="([^"]+)">([^<]+)<\/a>)/g, function () {
matches.push(Array.prototype.slice.call(arguments, 1, 4));
});
alert(matches.join("\n"));
给予:
<a href="http://yahoo.com">Yahoo</a>,http://yahoo.com,Yahoo
<a href="http://google.com">Google</a>,http://google.com,Google
这篇关于javascript正则表达式从锚标记中提取锚文本和URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!