问题描述
我正在尝试编写一个脚本,该脚本采用带有某些参数的URL,从生成的网页中读取新URL列表,然后在本地下载它们。我对编程非常陌生,并且从未使用过Python 3,因此我有些失落。
I am trying to write a script that takes in a URL with certain parameters, reads from the resulting web page a list of new URLs, and downloads them locally. I am very new to programming and have never used Python 3, so I am a little lost.
下面是示例代码来进一步说明:
Here is example code to explain further:
param1 =
param2 =
param3 =
requestURL = "http://examplewebpage.com/live2/?target=param1&query=param2&other=param3"
html_content = urllib2.urlopen(requestURL).read()
#I don't know where to go from here
#Something that can find when a URL appears on the page and append it to a list
#Then download everything from that list
#this can download something from a link:
#file = urllib.URLopener()
#file.retrieve(url, newfilelocation)
request-URL的输出是一个很长的页面可以是XML或JSON,并且具有很多不一定需要的信息,因此需要某种形式的搜索来找到需要从以后下载的URL。在页面上找到的URL直接指向所需的文件(它们以.jpg,.cat等结尾)。
The output from the request-URL is a very long page that can be in XML or JSON and has a lot of information not necessarily needed, so some form of searching is needed to find the URLs that need to be downloaded from later. The URLs found on the page lead directly to the needed files (They end in .jpg, .cat, etc).
如果您需要其他任何信息,请告诉我!如果造成混淆,我深表歉意。
Please let me know if you need any other information! My apologies if this is confusing.
另外,理想情况下,我会将下载的文件全部转到为它们创建的新文件夹(子目录),并将文件名作为当前日期和时间,但是我认为我可以自己弄清楚这部分。
Also, ideally I would have the downloaded files all go to a new folder (sub-dir) created for them with the filename as the current date and time, but I think I can figure this part out myself.
推荐答案
我建议您查看用于解析返回的页面。
有了它,您可以循环浏览链接并相当容易地提取链接地址,并将其附加到链接列表中。
I would recommend checking out BeautifulSoup for parsing the returned page.With it, you can loop through the links and extract the link address fairly easy and append them to a list of the links.
这篇关于使用参数并读取结果的脚本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!