Python selenium 获取由 javascript 添加的网页内容

本文介绍了Python selenium 获取由 javascript 添加的网页内容的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用名为网易云音乐"的在线音乐播放器，并且我的帐户中有多个播放列表，它们包含数千首曲目，并且组织和分类非常差，并且保存了重复条目，因此我想将它们导出到一个 SQL 表来组织它们.

我找到了一种不使用客户端软件查看播放列表的方法，即点击播放列表页面顶部的分享按钮，然后点击复制链接".

但在客户端以外的任何浏览器中打开链接，播放列表将限制为 1000 首曲目.

但我找到了克服它的方法，我安装了

第一列是歌曲名，第二列是时长，第三列是艺术家，最后一列是专辑.

第一列、第三列和第四列的文字分别是歌曲、艺术家和专辑页面的超链接.

我对 html 一无所知，但我设法获得了它的数据结构.

我们需要的是位于xpath//table/tbody的表，每一行都是表的一个子节点，名为tr(xpath //table/tbody/tr).

这是一个示例行:

<div class="hd"><span data-res-id="5221710";数据资源类型=18"data-res-action=播放"data-res-from=13"；数据资源数据=158624364"；class=ply"></span><span class=num">1</span></div></td><td><div class="f-cb"><div class="tt"><div class="ttc"><span class="txt"><a href=#/song?id=5221710"><b title=Axel F">Axel F</b></a></span>

/td[2]/div/div/div/span/a/b -->标题/td[2]/div/div/div/span/a -->歌曲链接/td[3]/span -->期间/td[4]/div/span/a -->艺术家/td[4]/div/span/a['href'] -->艺术家链接/td[5]/div/a -->专辑/td[5]/div/a['href'] -->专辑链接

from selenium import webdriverfrom selenium.webdriver.common.by import By从 selenium.webdriver.support.ui 导入 WebDriverWait从 selenium.webdriver.support 导入 expected_conditions 作为 ECFirefox = webdriver.Firefox()# 等待初始化，以秒为单位等待 = WebDriverWait(火狐，20)Firefox.get('https://music.163.com/#/playlist?id=158624364&userid=126762751')wait.until(EC.visibility_of_element_located((By.XPATH, '//table/tbody/tr')))Firefox.find_elements_by_xpath('//table/tbody/tr')

from selenium import webdriverfrom selenium.webdriver.common.by import By从 selenium.webdriver.support.ui 导入 WebDriverWait从 selenium.webdriver.support 导入 expected_conditions 作为 ECFirefox = webdriver.Firefox()# 等待初始化，以秒为单位等待 = WebDriverWait(火狐，20)Firefox.get('https://music.163.com/#/playlist?id=158624364&userid=126762751')iframe = driver.find_element_by_xpath('//iframe[@id="g_iframe"]')driver.switch_to.frame(iframe)wait.until(EC.visibility_of_element_located((By.XPATH, '//table/tbody/tr')))Firefox.find_elements_by_xpath('//table/tbody/tr')

<td class="left"> <div class="hd "><span data-res-id="5221710" data-res-type="18" data-res-action="play" data-res-from="13" data-res-data="158624364" class="ply "> </span><span class="num">1</span></div> </td> <td> <div class="f-cb"> <div class="tt"> <div class="ttc"> <span class="txt"> <a href="#/song?id=5221710"><b title="Axel F">Axel F</b></a> </span> </div> </div> </div> </td> <td class=" s-fc3"> <span class="u-dur candel">03:00</span> <div class="opt hshow"> <a class="u-icn u-icn-81 icn-add" href="javascript:;" title="添加到播放列表" hidefocus="true" data-res-type="18" data-res-id="5221710" data-res-action="addto" data-res-from="13" data-res-data="158624364"></a> <span data-res-id="5221710" data-res-type="18" data-res-action="fav" class="icn icn-fav" title="收藏"></span> <span data-res-id="5221710" data-res-type="18" data-res-action="share" data-res-name="Greatest Hits Of The Millennium 80's Vol.2" data-res-author="Harold Faltermeyer" data-res-pic="https://p2.music.126.net/tOa6Tizqy755OZE7ITsw_g==/775155697626111.jpg" class="icn icn-share" title="分享">分享</span> <span data-res-id="5221710" data-res-type="18" data-res-action="download" class="icn icn-dl" title="下载"></span> <span data-res-id="5221710" data-res-type="18" data-res-from="13" data-res-data="158624364" data-res-action="delete" class="icn icn-del" title="删除">删除</span> </div> </td> <td> <div class="text" title="Harold Faltermeyer"> <span title="Harold Faltermeyer"> <a href="#/artist?id=34854" hidefocus="true">Harold Faltermeyer</a> </span> </div> </td> <td> <div class="text"> <a href="#/album?id=509819" title="Greatest Hits Of The Millennium 80's Vol.2">Greatest Hits Of The Millennium 80's Vol.2</a> </div> </td>

/td[2]/div/div/div/span/a/b --> title /td[2]/div/div/div/span/a --> song link /td[3]/span --> duration /td[4]/div/span/a --> artist /td[4]/div/span/a['href'] --> artist link /td[5]/div/a --> album /td[5]/div/a['href'] --> album link

from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC Firefox = webdriver.Firefox() # Wait for initialize, in seconds wait = WebDriverWait(Firefox, 20) Firefox.get('https://music.163.com/#/playlist?id=158624364&userid=126762751') wait.until(EC.visibility_of_element_located((By.XPATH, '//table/tbody/tr'))) Firefox.find_elements_by_xpath('//table/tbody/tr')

from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC Firefox = webdriver.Firefox() # Wait for initialize, in seconds wait = WebDriverWait(Firefox, 20) Firefox.get('https://music.163.com/#/playlist?id=158624364&userid=126762751') iframe = driver.find_element_by_xpath('//iframe[@id="g_iframe"]') driver.switch_to.frame(iframe) wait.until(EC.visibility_of_element_located((By.XPATH, '//table/tbody/tr'))) Firefox.find_elements_by_xpath('//table/tbody/tr')

playlists

Python selenium 获取由 javascript 添加的网页内容

问题描述