内容:
我正在尝试编写自己的货币汇总器代码,因为市场上大多数可用工具尚未涵盖所有金融网站。我在raspberrypi上使用python 2.7.9。
到目前为止,由于请求库的帮助,我得以连接到我的两个帐户(一个乌鸦借贷网站和一个用于养老金的帐户)。
自2个星期以来,我试图整合的第三个网站给了我很大的麻烦,它的名称为https://www.amundi-ee.com。
我发现该网站实际上使用的是JavaScript,经过大量研究,我最终使用了dryscrape(我无法使用硒,因为不再支持Arm)。
问题:
运行此代码时:
import dryscrape
url='https://www.amundi-ee.com'
extensionInit='/psf/#login'
extensionConnect='/psf/authenticate'
extensionResult='/psf/#'
urlInit = url + extensionInit
urlConnect = url + extensionConnect
urlResult = url + extensionResult
s = dryscrape.Session()
s.visit(urlInit)
print s.body()
login = s.at_xpath('//*[@id="identifiant"]')
login.set("XXXXXXXX")
pwd = s.at_xpath('//*[@name="password"]')
pwd.set("YYYYYYY")
# Push the button
login.form().submit()
s.visit(urlConnect)
print s.body()
s.visit(urlResult)
代码访问urlConnect第21行时出现问题,正文打印行22返回以下内容:
{"code":405,"message":"No route found for \u0022GET \/authenticate\u0022: Method Not Allowed (Allow: POST)","errors":[]}
题
为什么会有这样的错误消息,如何正确登录网站以检索所需的数据?
PS:我的代码灵感来自此问题
Python dryscrape scrape page with cookies
最佳答案
好的,经过一个多月的努力,我很高兴地说我终于设法得到了想要的东西
有什么问题
基本上有2件主要事情(也许更多,但我之间可能已经忘记了):
密码必须通过按钮来推送,并且密码是随机的
生成,因此每次访问时都需要进行新的映射
通过单击验证按钮,login.form().submit()
弄乱了对所需数据页面的访问
这是最终的代码,如果您发现用法不正确,请立即注释,因为我是python新手和零星的编码员。
import dryscrape
from bs4 import BeautifulSoup
from lxml import html
from time import sleep
from webkit_server import InvalidResponseError
from decimal import Decimal
import re
import sys
def getAmundi(seconds=0):
url = 'https://www.amundi-ee.com/psf'
extensionInit='/#login'
urlInit = url + extensionInit
urlResult = url + '/#'
timeoutRetry=1
if 'linux' in sys.platform:
# start xvfb in case no X is running. Make sure xvfb
# is installed, otherwise this won't work!
dryscrape.start_xvfb()
print "connecting to " + url + " with " + str(seconds) + "s of loading wait..."
s = dryscrape.Session()
s.visit(urlInit)
sleep(seconds)
s.set_attribute('auto_load_images', False)
s.set_header('User-agent', 'Google Chrome')
while True:
try:
q = s.at_xpath('//*[@id="identifiant"]')
q.set("XXXXXXXX")
except Exception as ex:
seconds+=timeoutRetry
print "Failed, retrying to get the loggin field in " + str(seconds) + "s"
sleep(seconds)
continue
break
#get password button mapping
print "loging in ..."
soup = BeautifulSoup(s.body())
button_number = range(10)
for x in range(0, 10):
button_number[int(soup.findAll('button')[x].text.strip())] = x
#needed button
button_1 = button_number[1] + 1
button_2 = button_number[2] + 1
button_3 = button_number[3] + 1
button_5 = button_number[5] + 1
#push buttons for password
button = s.at_xpath('//*[@id="num-pad"]/button[' + str(button_2) +']')
button.click()
button = s.at_xpath('//*[@id="num-pad"]/button[' + str(button_1) +']')
button.click()
..............
# Push the validate button
button = s.at_xpath('//*[@id="content"]/router-view/div/form/div[3]/input')
button.click()
print "accessing ..."
sleep(seconds)
while True:
try:
soup = BeautifulSoup(s.body())
total_lended = soup.findAll('span')[8].text.strip()
total_lended = total_lended = Decimal(total_lended.encode('ascii','ignore').replace(',','.').replace(' ',''))
print total_lended
except Exception as ex:
seconds+=1
print "Failed, retrying to get the data in " + str(seconds) + "s"
sleep(seconds)
continue
break
s.reset()
关于javascript - dryscrape:“没有找到适合的路线……”,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/43833051/