我如何从Perl中提取HTML和链接文本？

本文介绍了我如何从Perl中提取HTML和链接文本？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我以前问过如何在Groovy中做到这一点。但是，现在我正在用Perl重写我的应用程序，因为所有的CPAN库都是这样。

如果页面包含以下链接：

 
< a href =http://www.google.com> Google< / a> 
 
< a href =http://www.apple.com> Apple< / a>

输出结果为：

 
 Google ，http://www.google.com 
苹果，http://www.apple.com

什么是在Perl中执行此操作的最佳方法是什么？

使用模块。它会为您取得您的网页，然后让您轻松地处理URL列表。

my $ mech = WWW :: Mechanize-> new（）; $ mech-> get（$ some_url）; my @links = $ mech-> links（）; 为我的$ link（@links）{ printf％s，％s\\\ ，$ link->文本，$ link-> url; }
非常简单，如果您想要导航到其他网址页面，它甚至更简单。

Mech基本上是一个对象中的浏览器。

I previously asked how to do this in Groovy. However, now I'm rewriting my app in Perl because of all the CPAN libraries.
If the page contained these links:
<a href="http://www.google.com">Google</a> <a href="http://www.apple.com">Apple</a>
The output would be:
Google, http://www.google.com Apple, http://www.apple.com
What is the best way to do this in Perl?
解决方案
Please look at using the WWW::Mechanize module for this. It will fetch your web pages for you, and then give you easy-to-work with lists of URLs.
my $mech = WWW::Mechanize->new(); $mech->get( $some_url ); my @links = $mech->links(); for my $link ( @links ) { printf "%s, %s\n", $link->text, $link->url; }
Pretty simple, and if you're looking to navigate to other URLs on that page, it's even simpler.
Mech is basically a browser in an object.

这篇关于我如何从Perl中提取HTML和链接文本？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！