我一直在尝试从youtube视频链接中检查 channel /上传者是否已通过验证(蓝色标志)。似乎Youtube API没有此功能,因此我一直在尝试使用BeautifulSoup
进行抓取。这是我尝试过的:
from bs4 import BeautifulSoup
import requests
url = "https://www.youtube.com/watch?v=" + video_id
source = requests.get(url).text
bs = BeautifulSoup(source, 'lxml')
# does not work
bs.find_all("div", {"class": "badge badge-style-type-verified style-scope ytd-badge-supported-renderer"})
我尝试跟踪导致ytd-badge
类的HTML元素的层次结构,并通过检查发现了这一点:html->正文-> ytd-app-> #content->#page-manager-> ytd-watch-flexy-> #columns-> #primary-> div#primary-inner.style-scope.ytd-watch- flexy-> #meta->#meta-content-> ytd-video-secondary-info-renderer.style-scope.ytd-watch-flexy-> #container-> div。#top-row.style-scope.ytd -video-secondary-info-renderer-> ytd-video-owner-renderer-> div。#upload-info.style-scope.ytd-video-owner-renderer->#channel-name-> ytd-badge-supported -renderer.style-scope.ytd-channel-name
它很长很疯狂,所以我想知道如何访问它?有没有更简单的方法可以使用Python完成此操作?谢谢!
最佳答案
YouTube使用JavaScript,因此请使用Requests-HTML抓取页面。
使用pip install requests-html
安装它。
由于网页上有多个带有徽章的视频,因此我们需要检查包含徽章的类(badge badge-style-type-verified style-scope ytd-badge-supported-renderer
)是否存在于 channel 的信息类(style-scope ytd-video-owner-renderer
)下。
from requests_html import HTMLSession
from bs4 import BeautifulSoup
video_id = ""
video_url = "https://www.youtube.com/watch?v=" + video_id
# Initialize an HTML Session
session = HTMLSession()
# Get the html content
response = session.get(video_url)
# Execute JavaScript
response.html.render(sleep=3)
soup = BeautifulSoup(response.html.html, "lxml")
# Find the channel info class
channel_info = soup.select_one('.style-scope ytd-video-owner-renderer')
# Check if the class that contains the verified badge exists in the channel info class
if channel_info.find('div', class_='badge badge-style-type-verified style-scope ytd-badge-supported-renderer'):
print('Verified')
else:
print('NOT verified!')
关于python - 使用Beautiful Soup刮过YouTube验证徽章的实例吗?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/63286395/