问题描述
我将使用Google App Script从广播电台的网站获取节目列表。
如何通过指定元素的id来选择网页中的指定元素?
因此,我可以得到网页中的程序。
I am gonna to use Google App Script to fetch the programme list from the website of radio station.How can I select the specified elements in the webpage by specifying the id of the element?Therefore, I can get the programs in the webpage.
推荐答案
编辑,2013年12月: strong> Google已弃用旧的 Xml
服务,将其替换为。此答案中的脚本已更新为使用新服务。新服务需要符合标准的XML& HTML,而旧的是宽恕诸如缺少密码的问题。
Edit, Dec 2013: Google has deprecated the old Xml
service, replacing it with XmlService
. The script in this answer has been updated to use the new service. The new service requires standard-compliant XML & HTML, while the old one was forgiving of such problems as missing close-tags.
看看。 (截至2013年12月,本教程仍然在线,尽管Xml服务已被弃用。)从该基础开始,您可以利用脚本服务中的XML解析来浏览页面。这是一个小脚本,您可以在此示例中操作:
Have a look at the Tutorial: Parsing an XML Document. (As of Dec 2013, this tutorial is still on line, although the Xml service is deprecated.) Starting with that foundation, you can take advantage of the XML parsing in Script Services to navigate the page. Here's a small script operating on your example:
function getProgrammeList() {
txt = '<html> <body> <div> <div> <div id="here">hello world!!</div> </div> </div> </html>'
// Put the receieved xml response into XMLdocument format
var doc = Xml.parse(txt,true);
Logger.log(doc.html.body.div.div.div.id +" = "
+doc.html.body.div.div.div.Text ); /// here = hello world!!
debugger; // Pause in debugger - examine content of doc
}
要获取真实页面,从这开始:
To get the real page, start with this:
var url = 'http://blah.blah/whatever?querystring=foobar';
var txt = UrlFetchApp.fetch(url).getContentText();
....
如果您查看你会看到有支持检索特定的标签,例如div。这找到一个特定元素的直接子元素,它不会探索整个XML文档。您应该能够编写一个遍历文档的函数,检查每个 div
元素的 id
,直到找到您的程序列表。
If you look at the documentation for getElements
you'll see that there is support for retrieving specific tags, for example "div". That finds direct children of a specific element, it doesn't explore the entire XML document. You should be able to write a function that traverses the document examining the id
of each div
element until it finds your programme list.
var programmeList = findDivById(doc,"here");
编辑 - 我无法帮助自己..
这是一个实用功能,只能这样做。
Edit - I couldn't help myself...
Here's a utility function that will do just that.
/**
* Find a <div> tag with the given id.
* <pre>
* Example: getDivById( html, 'tagVal' ) will find
*
* <div id="tagVal">
* </pre>
*
* @param {Element|Document}
* element XML document or element to start search at.
* @param {String} id HTML <div> id to find.
*
* @return {XmlElement} First matching element (in doc order) or null.
*/
function getDivById( element, id ) {
// Call utility function to do the work.
return getElementByVal( element, 'div', 'id', id );
}
/**
* !Now updated for XmlService!
*
* Traverse the given Xml Document or Element looking for a match.
* Note: 'class' is stripped during parsing and cannot be used for
* searching, I don't know why.
* <pre>
* Example: getElementByVal( body, 'input', 'value', 'Go' ); will find
*
* <input type="submit" name="btn" value="Go" id="btn" class="submit buttonGradient" />
* </pre>
*
* @param {Element|Document}
* element XML document or element to start search at.
* @param {String} elementType XML element type, e.g. 'div' for <div>
* @param {String} attr Attribute or Property to compare.
* @param {String} val Search value to locate
*
* @return {Element} First matching element (in doc order) or null.
*/
function getElementByVal( element, elementType, attr, val ) {
// Get all descendants, in document order
var descendants = element.getDescendants();
for (var i =0; i < descendants.length; i++) {
var elem = descendants[i];
var type = elem.getType();
// We'll only examine ELEMENTs
if (type == XmlService.ContentTypes.ELEMENT) {
var element = elem.asElement();
var htmlTag = element.getName();
if (htmlTag === elementType) {
if (val === element.getAttribute(attr).getValue()) {
return element;
}
}
}
}
// No matches in document
return null;
}
将此应用于您的示例,我们得到:
Applying this to your example, we get this:
function getProgrammeList() {
txt = '<html> <body> <div> <div> <div id="here">hello world!!</div> </div> </div> </html>'
// Get the receieved xml response into an XML document
var doc = XmlService.parse(txt);
var found = getDivById(doc.getElement(),'here');
Logger.log(found.getAttribute(attr).getValue()
+ " = "
+ found.getValue()); /// here = hello world!!
}
注意:请参阅,了解使用这些实用程序的一个实际例子。
Note: See this answer for a practical example of the use of these utilities.
这篇关于Google Apps Script有没有像getElementById?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!