问题描述
我正在写一个脚本来从一个网站下载多个FLAC,而我正在使用Beautiful Soap来获取flac链接,并使用 urlopen 下载链接
I'm writing a script to download multiple FLACs from a website, and I'm using Beautiful Soup to get the flac link and downloading the links with urlopen
我想要BS搜索以.flac结尾的链接(我不知道文件名,只是扩展名EX:1文件是 XXX.flac
另一个是 YYY.flac
)
I want BS to search for a link that ends in .flac (I don't know the file name, just the extension EX: 1 file is XXX.flac
, the other is YYY.flac
)
flac文件的HTML在这里
The HTML for the flac file is here
<b><a class=location href="/soundtracks/index.php">Soundtracks</a><font class=location> » </font><a href="/soundtracks/highquality/index.php">High Quality Game
Soundtracks [FLAC]</a><font class=location> » </font><a href="/soundtracks/highquality/Metal_Gear_20th_Anniversary/72">Metal Gear 20th Anniversary</a><font class=location> » 01 Metal Gear 20 Years History -Past, Present, Future- Download</font></b><h1>Metal Gear 20th Anniversary Download Links:</h1><a style="font-size: 16px; font-weight:bold;" href="http://50.7.161.234/bks/94/245/Music/[029] MG 20th Anniversary [FLAC]/01 Metal Gear 20 Years History -Past, Present, Future-.flac">Metal Gear 20th Anniversary - 01 Metal Gear 20 Years History -Past, Present, Future-</a> <font face="Verdana" style="font-size: 16px;">Format: FLAC, Size: 76M</font><br> <font face="Verdana" style="font-size: 10px;"><b>Note: If the file starts playing in your browser window, try right-clicking and "Save Target As"</b></font><br>
我试图找到id。 t = soup.find(id =flac)
但我没有得到任何相关的结果。我相当空白,我不知道有什么办法解决它
I have tried to find id. t = soup.find(id="flac")
but I don't get any relevant results. I'm quite blank on this I don't know of any way to solve it
我将如何让BS搜索并找到文件链接,然后分配该文件链接到变量?
How would I get BS to search and find the file link and then assign that file link to a variable?
import mechanize
import urllib, urllib2, re
from bs4 import BeautifulSoup
####MECHANIZE####
br = mechanize.Browser()
res = br.open("http://www.emuparadise.me/soundtracks/highquality/Metal_Gear_20th_Anniversary/72")
a = 2 #COUNTER FOR LOOP
br.follow_link(text_regex='Download', nr=a)
b = br.geturl() #GETS THE URL
print b
page = urllib2.urlopen(b).read()
soup = BeautifulSoup(page)
soup.prettify()
t = soup.find(id="")
print t
推荐答案
您的代码尝试在链接到这些flac的锚点标签中不存在的 id
属性匹配。
Your code is trying to match on an id
attribute that doesn't exist in the anchor tags linking to those flacs.
而是使用正则表达式来匹配 href
> .flac :
Instead use a regex to match href
's that end in .flac
:
t = soup.find_all(href=re.compile(".flac$"))
这篇关于如何获得美丽的汤从href和类的链接?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!