问题描述
下面是我在Powershell中获取网页链接的代码.间歇性地,我得到无法索引到空数组"异常.这段代码有什么问题吗?需要帮助.
Below is my code in powershell to fetch the links in a webpage. Intermittently, I get "Cannot index into null array" exception. Is there anything wrong in this code. Help required.
$Download = $wc.DownloadString($Link)
$List = $Download -split "<a\s+" | %{ [void]($_ -match "^href=[`'`"]([^`'`">\s]*)"); $matches[1] }
推荐答案
您不需要自己解析任何内容(正如注释中指出的那样,您不能首先使用正则表达式来解析HTML) .使用Invoke-Webrequest
来获取页面;它返回的对象的属性之一是页面上所有链接的集合,这些链接已经为您解析了.
You don't need to parse anything yourself (and as was pointed out in the comments, you can't parse HTML with a regex in the first place). Use Invoke-Webrequest
to fetch the page; one of the properties of the object it returns is a collection of all the links on the page, already parsed out for you.
示例:
$Link = "https://stackoverflow.com/questions/49418802/getting-links-from-webpage-in-powershell-using-regular-expression";
Invoke-WebRequest -Uri $Link | Select-Object -ExpandProperty links;
或者,如果仅需要URL,则可以更简洁一些:
Or, if you need just the URLs, you can do it a bit more concisely:
$Link = "https://stackoverflow.com/questions/49418802/getting-links-from-webpage-in-powershell-using-regular-expression";
(Invoke-WebRequest -Uri $Link).links.href;
这篇关于使用正则表达式从Powershell中的网页获取链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!