使用Selenium和python在网页的网格内抓取javascript数据

本文介绍了使用Selenium和python在网页的网格内抓取javascript数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的问题是，我需要网格中包含来自网站 https://applipedia.paloaltonetworks的子域的所有数据. com -(包含NAME，CATEGORY，SUBCATEGORY，RISK，TECHNOLOGY的数据).我需要的是[示例:在第5行中:2ch有2个子域| _2ch-base和2ch-posting.像这样，我只想获取具有子域的所有应用程序的列表]

My issue is that I need all the data within the grid containing subdomains from the website https://applipedia.paloaltonetworks.com - (data containing NAME , CATEGORY, SUBCATEGORY, RISK, TECHNOLOGY). What I require is [Example: In line number 5: 2ch has 2 subdomains |_2ch-base and 2ch-posting. Like this I only want to get the list of all apps having subdomains]

当我尝试在行中添加任何内容时，就不行了:

Right not whenever I have tried adding anything in the line:

table =wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,    'tbody#bodyScrollingTable tr')))

我收到超时错误.

下面是我到目前为止拥有的脚本，该脚本可以从网格中获取所有数据，但是我只需要应用程序，并且包含子域.[示例2ch，2ch-base，2ch-posting].我通过检阅元素发现了一种模式，即所有不具有子域的应用程序都具有()，或者我们可以通过()字段进行查找，这对于所有具有子域的应用程序都是常见的.对于解决此问题的任何帮助将不胜感激.

Below is the script I have as of now which fetches all the data from the grid but I need only the apps and it's containing subdomains.[Example 2ch, 2ch-base, 2ch-posting]. I have found out a pattern through inspect element which is all apps that doesn't have subdomains have ( ) or we can go by the () field which is common for all apps having subdomains. Any help on solving this problem will be much appreciated.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver   = webdriver.Chrome(executable_path = r'/Users/am/Downloads/chromedriver')
driver.maximize_window()

driver.get("https://applipedia.paloaltonetworks.com/")

wait = WebDriverWait(driver,30)

table =wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,    'tbody#bodyScrollingTable tr')))

for tab in table:
  print(tab.text)

推荐答案

按照网址https://applipedia.paloaltonetworks.com/获取具有子域的所有应用的列表，您需要根据所需的诱导 WebDriverWait >元素可见，您可以使用以下解决方案:

As per the url https://applipedia.paloaltonetworks.com/ to get the list of all apps having subdomains you need to induce WebDriverWait for the desired elements to be visible and you can use the following solution:

代码块:

Code Block:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
options.add_argument("--disable-gpu")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\ChromeDriver\chromedriver_win32\chromedriver.exe')
driver.get('https://applipedia.paloaltonetworks.com/')
elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@class='btmTable' and @id='dataTable']//tbody[@id='bodyScrollingTable']//tr[not(@ottawagroup='0') and not(@ottawagroup='2')]/td/a")))
for element in elements:
    print(element.get_attribute("innerHTML"))

控制台输出:

Console Output:

DevTools listening on ws://127.0.0.1:12927/devtools/browser/d4a5d576-a4b0-4a3d-959b-9d37aff36fc6

                                2ch


                                51.com


                                adobe-connect


                                adobe-connectnow


                                adobe-creative-cloud


                                aim


                                aim-express


                                ali-wangwang


                                amazon-cloud-drive


                                amazon-music


                                ameba-now


                                assembla


                                autodesk360


                                avaya-webalive


                                bacnet


                                baidu-hi


                                bebo


                                bitbucket


                                boxnet


                                buddybuddy


                                chinaren


                                cisco-spark


                                cloudapp


                                cloudforge


                                cloudinary


                                concur


                                confluence


                                convo


                                cyph


                                daum


                                dcinside


                                diameter


                                dnp3


                                dochub


                                docstoc


                                docusign


                                draw.io


                                dropbox


                                egnyte


                                evernote


                                facebook


                                fetion


                                filestack


                                flickr


                                flixwagon


                                fuze-meeting


                                gatherplace


                                genesys


                                git


                                github


                                gitlab


                                glassdoor


                                globalmeet


                                gmail


                                google-calendar


                                google-cloud-storage


                                google-docs


                                google-hangouts


                                google-plus


                                google-spaces


                                google-talk


                                google-translate


                                google-video


                                gotomypc


                                gotowebinar


                                gtp


                                hadoop


                                hightail


                                hipchat


                                hootsuite


                                huddle


                                hulu


                                hyves


                                iccp


                                icloud


                                iec-60870-5-104


                                imeet


                                imgur


                                instagram


                                instan-t


                                ip-messenger


                                ipsec


                                irc


                                issuu


                                itunes


                                jira


                                join-me


                                jumpshare


                                kaixin


                                kaixin001


                                kakaotalk


                                laiwang


                                landesk


                                linkedin


                                live-mesh


                                lotus-notes


                                lotuslive


                                lucidpress


                                mail.ru


                                mail.ru-agent


                                maytech


                                meebo


                                meetup


                                mega


                                mendeley


                                mercurial


                                mixi


                                modbus


                                ms-ds-smb


                                ms-lync


                                ms-office365


                                ms-onedrive


                                msn


                                myspace


                                nateon-im


                                netease-webdisk


                                netflix


                                ning


                                noteworthy


                                now-tv


                                odnoklassniki


                                onehub


                                owncloud


                                paltalk


                                pastebin


                                pcanywhere


                                pinterest


                                pivotaltracker


                                powow


                                prezi


                                proofhub


                                qik


                                qliksense-cloud


                                qq


                                quip


                                quora


                                rally-software


                                readytalk


                                reddit


                                rediffbol


                                renren


                                rtp


                                salesforce


                                sap-jam


                                screencast


                                scribd


                                second-life


                                secure-data-space


                                sendthisfile


                                service-now


                                sharefile


                                sharepoint


                                sharevault


                                showmax


                                siemens-s7


                                signiant


                                sina-uc


                                sina-weibo


                                skydrive


                                slack


                                slideshare


                                smartsheet


                                snmp


                                softros-messenger


                                solarwinds


                                soundcloud


                                sourceforge


                                spark-im


                                ss7-map


                                stocktwits


                                storify


                                subversion


                                surveymonkey


                                syncplicity


                                tableau


                                teamdrive


                                teamup-calendar


                                teamviewer


                                thwapr


                                torch-browser


                                trello


                                tumblr


                                twitter


                                uc-yun


                                viber


                                vimeo


                                vine


                                virustotal


                                vkontakte


                                vnc


                                watchdox


                                webex


                                wechat


                                weiyun


                                whatsapp


                                windows-azure


                                windows-defender-atp


                                workday


                                yahoo-im


                                yammer


                                youku


                                yousendit


                                youtube


                                yunpan360


                                yy-voice


                                zalo


                                zendesk


                                zenefits


                                zettahost

这篇关于使用Selenium和python在网页的网格内抓取javascript数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！