


I'm using Beautifulsoup with Python.I try to get elements from a link containing a hash #. It's a pagination link, the part after the # is the page number.


It doesn't work, I understood the problem is because urllib2 can't handle this since the part of the URL after the # is for client side handling and is never send to the server.


So I checked the real URL using the network tab of the developer tools in Chrome and it gives me this :

http://www.myserver.com/modules/blocklayered/blocklayered-ajax.php?_=1486617675431&id_category_layered=24&layered_weight_slider=0_10&layered_price_slider= 21_2991& orderby = position& orderway = desc& n = 20& p = 3

服务器似乎根本不喜欢此URL,因为它返回了一个仅包含以下奇怪结果的空白页面: {"filtersBlock":"\ n \ n

It looks like the server doesn't like this URL at all because it returns me a blank page containing only this weird result : {"filtersBlock":"\n\n


So my question is, is there a way to handle these kind of link with BeautifulSoup ?


我找到了一种使用BeautifulSoup抓取DOM和Selenium来处理包含#的链接的方法.只需通过 driver.get("www.myserver.com/products#/page-2")将包含#的链接传递到Selenium驱动程序即可.

I found a way doing this using BeautifulSoup to crawl the DOM and Selenium to handle these links containing a #. Just passing the link containing the # to Selenium driver with driver.get("www.myserver.com/products#/page-2") works.


11-01 11:30