网页抓取 - 如何访问通过Angular.js在JavaScript中呈现的内容？

本文介绍了网页抓取 - 如何访问通过Angular.js在JavaScript中呈现的内容？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想从公共站点

页面包含 DIV 带班视图内容，里面有我的信息需要：

The page http://www.asx.com.au/asx/research/company.do#!/ACB/details contains a div with class 'view-content', which has the information I need:

但是，当我尝试通过浏览这个页面Python的 urllib2.urlopen 该分区是空的：

But when I try to view this page via Python's urllib2.urlopen that div is empty:

import urllib2
from bs4 import BeautifulSoup

url = 'http://www.asx.com.au/asx/research/company.do#!/ACB/details'
page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page, "html.parser")
contentDiv = soup.find("div", {"class": "view-content"})
print(contentDiv)

# the results is an empty div:
# <div class="view-content" ui-view=""></div>

是否有可能的该div编程方式访问的内容？

Is it possible to access the contents of that div programmatically?

编辑：根据评论看来，内容是通过 Angular.js 渲染。是否有可能通过Python来触发内容的呈现？

as per the comment it appears that the content is rendered via Angular.js. Is it possible to trigger the rendering of that content via Python?

推荐答案

本页面使用JavaScript来读取服务器上的数据，并填写页面。

This page use JavaScript to read data from server and fill page.

我看你用的开发工具铬 - 看到标签上的XHR，网络或JS的要求

I see you use developer tools in chrome - see in tab "Network" on "XHR" or "JS" requests.

我发现这个网址

<一个href=\"http://data.asx.com.au/data/1/company/ACB?fields=primary_share,latest_annual_reports,last_dividend,primary_share.indices&callback=angular.callbacks._0\" rel=\"nofollow\">http://data.asx.com.au/data/1/company/ACB?fields=primary_share,latest_annual_reports,last_dividend,primary_share.indices&callback=angular.callbacks._0

此网址给几乎JSON格式的所有数据

This url gives all data almost in JSON format

但是如果你使用这个链接，而不＆安培;回调= angular.callbacks._0 那么你在纯JSON格式获取数据，你将可以使用 JSON 模块，将其转换为Python字典。

But if you use this link without &callback=angular.callbacks._0 then you get data in pure JSON format and you will could use json module to convert it to python dictionary.

编辑：工作code

import urllib2
from bs4 import BeautifulSoup
import json

# new url
url = 'http://data.asx.com.au/data/1/company/ACB?fields=primary_share,latest_annual_reports,last_dividend,primary_share.indices'

# read all data
page = urllib2.urlopen(url).read()

# convert json text to python dictionary
data = json.loads(page)

print(data['principal_activities'])

Mineral exploration in Botswana, China and Australia.

这篇关于网页抓取 - 如何访问通过Angular.js在JavaScript中呈现的内容？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！