Closed. This question needs details or clarity。它当前不接受答案。
                        
                    
                
            
        
            
        
                
                    
                
            
                
                    想改善这个问题吗?添加详细信息并通过editing this post阐明问题。
                
                    5年前关闭。
            
        

    

我正在尝试抓取此页面上的所有博客链接:http://hypem.com/track/26ed4/Skizzy+Mars+-+Way+I+Live

您单击更多以显示链接。但是,在html源中仅可见一个链接。我正在使用BeautifulSoup,如何获得其他链接?

最佳答案

您可以使用requests + BeautifulSoup方法。单击More blogs按钮并向下滚动页面时,您只需模拟去往服务器的基础请求。

这是从http://hypem.com/blogs页打印所有博客文章图像标题的代码:

from bs4 import BeautifulSoup
import requests


def extract_blogs(content):
    first_page = BeautifulSoup(content)
    for link in first_page.select('div.directory-blog img'):
        print link.get('title')

# extract blogs from the main page
response = requests.get('http://hypem.com/blogs')
extract_blogs(response.content)

# paginate over rest results until there would be an empty response
page = 2
url = 'http://hypem.com/inc/serve_sites.php?featured=true&page={page}'

while True:
    response = requests.get(url.format(page=page))
    if not response.content.strip():
        break
    extract_blogs(response.content)
    page += 1


印刷品:

Heart and Soul
Avant-Avant
Different Kitchen
Ladywood
Orange Peel
Phonographe Corp
...
Stadiums & Shrines
Caipirinha Lounge
Gorilla Vs. Bear
ISO50 Blog
Fluxblog
Music ( for robots)


希望这至少为您提供了在这种情况下如何抓取网页内容的基本思想。

10-05 21:12
查看更多