如何在 Perl 中从 HTML 中提取 URL 和链接文本?

本文介绍了如何在 Perl 中从 HTML 中提取 URL 和链接文本?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我之前问过如何在 Groovy 中执行此操作.但是，由于所有的 CPAN 库，现在我正在用 Perl 重写我的应用程序.

如果页面包含这些链接:

<a href="http://www.google.com">Google</a><a href="http://www.apple.com">Apple</a>

输出将是:

谷歌，http://www.google.com苹果，http://www.apple.com

在 Perl 中执行此操作的最佳方法是什么?

解决方案

请看使用WWW::Mechanize 模块.它会为您获取网页，然后为您提供易于使用的网址列表.

my $mech = WWW::Mechanize->new();$mech->get( $some_url );我的@links = $mech->links();对于我的 $link ( @links ) {printf "%s, %s
", $link->text, $link->url;}

非常简单，如果您想导航到该页面上的其他网址，那就更简单了.

Mech 基本上是一个对象中的浏览器.

I previously asked how to do this in Groovy. However, now I'm rewriting my app in Perl because of all the CPAN libraries.

If the page contained these links:

<a href="http://www.google.com">Google</a>

<a href="http://www.apple.com">Apple</a>

The output would be:

Google, http://www.google.com
Apple, http://www.apple.com

What is the best way to do this in Perl?

解决方案

Please look at using the WWW::Mechanize module for this. It will fetch your web pages for you, and then give you easy-to-work with lists of URLs.

my $mech = WWW::Mechanize->new();
$mech->get( $some_url );
my @links = $mech->links();
for my $link ( @links ) {
    printf "%s, %s
", $link->text, $link->url;
}

Pretty simple, and if you're looking to navigate to other URLs on that page, it's even simpler.

Mech is basically a browser in an object.

这篇关于如何在 Perl 中从 HTML 中提取 URL 和链接文本?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！