我正在尝试从多个URL中提取特定的类。标签和类保持不变,但我需要我的python程序才能将所有内容都刮掉,因为我只需输入链接即可。

这是我的工作样本:

from bs4 import BeautifulSoup
import requests
import pprint
import re
import pyperclip

url = input('insert URL here: ')
#scrape elements
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

#print titles only
h1 = soup.find("h1", class_= "class-headline")
print(h1.get_text())

这适用于单个URL,但不适用于批处理。谢谢你帮我我从这个社区中学到了很多东西。

最佳答案

有一个URL列表并遍历它。

from bs4 import BeautifulSoup
import requests
import pprint
import re
import pyperclip

urls = ['www.website1.com', 'www.website2.com', 'www.website3.com', .....]
#scrape elements
for url in urls:
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")

    #print titles only
    h1 = soup.find("h1", class_= "class-headline")
    print(h1.get_text())

如果您要提示用户输入每个站点的信息,则可以通过这种方式完成
from bs4 import BeautifulSoup
import requests
import pprint
import re
import pyperclip

urls = ['www.website1.com', 'www.website2.com', 'www.website3.com', .....]
#scrape elements
msg = 'Enter Url, to exit type q and hit enter.'
url = input(msg)
while(url!='q'):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")

    #print titles only
    h1 = soup.find("h1", class_= "class-headline")
    print(h1.get_text())
    input(msg)

关于python - 使用Beautiful Soup抓取多个URL,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/40629457/

10-09 07:12