问题描述
我的问题是,我需要网格中包含来自网站 https://applipedia.paloaltonetworks的子域的所有数据. com -(包含NAME,CATEGORY,SUBCATEGORY,RISK,TECHNOLOGY的数据).我需要的是[示例:在第5行中:2ch有2个子域| _2ch-base和2ch-posting.像这样,我只想获取具有子域的所有应用程序的列表]
My issue is that I need all the data within the grid containing subdomains from the website https://applipedia.paloaltonetworks.com - (data containing NAME , CATEGORY, SUBCATEGORY, RISK, TECHNOLOGY). What I require is [Example: In line number 5: 2ch has 2 subdomains |_2ch-base and 2ch-posting. Like this I only want to get the list of all apps having subdomains]
当我尝试在行中添加任何内容时,就不行了:
Right not whenever I have tried adding anything in the line:
table =wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'tbody#bodyScrollingTable tr')))
我收到超时错误.
下面是我到目前为止拥有的脚本,该脚本可以从网格中获取所有数据,但是我只需要应用程序,并且包含子域.[示例2ch,2ch-base,2ch-posting].我通过检阅元素发现了一种模式,即所有不具有子域的应用程序都具有(),或者我们可以通过()字段进行查找,这对于所有具有子域的应用程序都是常见的.对于解决此问题的任何帮助将不胜感激.
Below is the script I have as of now which fetches all the data from the grid but I need only the apps and it's containing subdomains.[Example 2ch, 2ch-base, 2ch-posting]. I have found out a pattern through inspect element which is all apps that doesn't have subdomains have ( ) or we can go by the () field which is common for all apps having subdomains. Any help on solving this problem will be much appreciated.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path = r'/Users/am/Downloads/chromedriver')
driver.maximize_window()
driver.get("https://applipedia.paloaltonetworks.com/")
wait = WebDriverWait(driver,30)
table =wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'tbody#bodyScrollingTable tr')))
for tab in table:
print(tab.text)
推荐答案
按照网址https://applipedia.paloaltonetworks.com/
获取具有子域的所有应用的列表,您需要根据所需的诱导 WebDriverWait >元素可见,您可以使用以下解决方案:
As per the url https://applipedia.paloaltonetworks.com/
to get the list of all apps having subdomains you need to induce WebDriverWait for the desired elements to be visible and you can use the following solution:
-
代码块:
Code Block:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
options.add_argument("--disable-gpu")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\ChromeDriver\chromedriver_win32\chromedriver.exe')
driver.get('https://applipedia.paloaltonetworks.com/')
elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@class='btmTable' and @id='dataTable']//tbody[@id='bodyScrollingTable']//tr[not(@ottawagroup='0') and not(@ottawagroup='2')]/td/a")))
for element in elements:
print(element.get_attribute("innerHTML"))
控制台输出:
Console Output:
DevTools listening on ws://127.0.0.1:12927/devtools/browser/d4a5d576-a4b0-4a3d-959b-9d37aff36fc6
2ch
51.com
adobe-connect
adobe-connectnow
adobe-creative-cloud
aim
aim-express
ali-wangwang
amazon-cloud-drive
amazon-music
ameba-now
assembla
autodesk360
avaya-webalive
bacnet
baidu-hi
bebo
bitbucket
boxnet
buddybuddy
chinaren
cisco-spark
cloudapp
cloudforge
cloudinary
concur
confluence
convo
cyph
daum
dcinside
diameter
dnp3
dochub
docstoc
docusign
draw.io
dropbox
egnyte
evernote
facebook
fetion
filestack
flickr
flixwagon
fuze-meeting
gatherplace
genesys
git
github
gitlab
glassdoor
globalmeet
gmail
google-calendar
google-cloud-storage
google-docs
google-hangouts
google-plus
google-spaces
google-talk
google-translate
google-video
gotomypc
gotowebinar
gtp
hadoop
hightail
hipchat
hootsuite
huddle
hulu
hyves
iccp
icloud
iec-60870-5-104
imeet
imgur
instagram
instan-t
ip-messenger
ipsec
irc
issuu
itunes
jira
join-me
jumpshare
kaixin
kaixin001
kakaotalk
laiwang
landesk
linkedin
live-mesh
lotus-notes
lotuslive
lucidpress
mail.ru
mail.ru-agent
maytech
meebo
meetup
mega
mendeley
mercurial
mixi
modbus
ms-ds-smb
ms-lync
ms-office365
ms-onedrive
msn
myspace
nateon-im
netease-webdisk
netflix
ning
noteworthy
now-tv
odnoklassniki
onehub
owncloud
paltalk
pastebin
pcanywhere
pinterest
pivotaltracker
powow
prezi
proofhub
qik
qliksense-cloud
qq
quip
quora
rally-software
readytalk
reddit
rediffbol
renren
rtp
salesforce
sap-jam
screencast
scribd
second-life
secure-data-space
sendthisfile
service-now
sharefile
sharepoint
sharevault
showmax
siemens-s7
signiant
sina-uc
sina-weibo
skydrive
slack
slideshare
smartsheet
snmp
softros-messenger
solarwinds
soundcloud
sourceforge
spark-im
ss7-map
stocktwits
storify
subversion
surveymonkey
syncplicity
tableau
teamdrive
teamup-calendar
teamviewer
thwapr
torch-browser
trello
tumblr
twitter
uc-yun
viber
vimeo
vine
virustotal
vkontakte
vnc
watchdox
webex
wechat
weiyun
whatsapp
windows-azure
windows-defender-atp
workday
yahoo-im
yammer
youku
yousendit
youtube
yunpan360
yy-voice
zalo
zendesk
zenefits
zettahost
这篇关于使用Selenium和python在网页的网格内抓取javascript数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!