如何遍历页面的HTML返回所有以子字符串a开头并以子字符串B结尾的子字符串？

本文介绍了如何遍历页面的HTML返回所有以子字符串a开头并以子字符串B结尾的子字符串？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我对SQL更熟悉，所以我想我会用C＃来寻求帮助。

我的目标是从SQL调用C＃脚本服务器SSIS包，它通过一个网页解析可下载的链接，以我知道的2个子串开头和结尾不会改变。

网页在这里： []

我想找到以http://www.patentsview.org/data开头的HTML中的每个实例/并以.tsv.zip结尾。目前这是我的主要挑战（下一个挑战将是1）将这些保存为SSIS中的变量或某种类型，2）下载它们，3）解压缩它们，以及4）将它们加载到SQL Server数据库。）。不过，主要关注于此时解析HTML。

有没有人有关于如何做到这一点的意见？请记住，我以前从未使用过C＃，但我有其他语言编写的适量经验。

最好

Nico

我尝试过：

我尝试过使用第三方SSIS组件，但我相信使用脚本任务是最好的方法。

解决方案

I'm more familiar with SQL, so I thought I would reach out for help using C#.

My objective is to call a C# script from a SQL Server SSIS package which parses through a webpage for downloadable links starting and ending with 2 substrings that I know will not change.

The webpage is here: PatentsView Data Download[^]

I'd like to find every instance in the HTML that starts with "http://www.patentsview.org/data/" and ends with ".tsv.zip". For the moment this is my main challenge (the next challenges will be 1) saving these as variables or something of the sort in SSIS, 2) downloading them, 3) unzipping them, and 4) loading them to a SQL Server database.). Mainly focused on parsing the HTML at this point, though.

Does anyone have input on how to do this? Please keep in mind that I have never used C# before, but I have have a moderate amount of experience coding in other languages.

Best
Nico

What I have tried:

I have tried using third party SSIS components, but I believe using script tasks is the best way.

解决方案

这篇关于如何遍历页面的HTML返回所有以子字符串a开头并以子字符串B结尾的子字符串？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！