本文介绍了第1行发生错误:序言中不允许包含内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从此网站,使用以下代码;

I am trying to scrape a table of price data from this website using the following code;

function scrapeData() {
// Retrieve table as a string using Parser.
var url = "https://stooq.com/q/d/?s=barc.uk&i=d";

var fromText = '<td align="center" id="t03">';
var toText = '</td>';
var content = UrlFetchApp.fetch(url).getContentText();
var scraped = Parser.data(content).from(fromText).to(toText).build();

//Parse table using XmlService.
var root = XmlService.parse(scraped).getRootElement();
}

我从类似问题中使用的方法中采用了这种方法.此处但是它在这个特定的网址上失败并给了我错误;

I have taken this method from an approach I used in a similar question here however its failing on this particular url and giving me the error;

Error on line 1: Content is not allowed in prolog. (line 12, file "Stooq")

在相关问题中 此处他们谈论不接受提交给解析器的文本内容,但是,我无法将这些问题中的解决方案应用于我自己的问题.任何帮助将不胜感激.

In related questions here and here they talk of textual content that is not accepted being submitted to the parser however, I am unable to apply the solutions in these questions to my own problem. Any help would be much appreciated.

推荐答案

此修改如何?

  • 在这种情况下,需要修改检索到的HTML值.例如,当运行var content = UrlFetchApp.fetch(url).getContentText()时,不包含每个属性值.这些需要进行修改.
  • 标题中有一个合并的列.
  • In this case, it is required to modify the retrieved HTML values. For example, when var content = UrlFetchApp.fetch(url).getContentText() is run, each attribute value is not enclosed. These are required to be modified.
  • There is a merged column in the header.

当以上几点反映到脚本中时,它如下所示.

When above points are reflected to the script, it becomes as follows.

function scrapeData() {
  // Retrieve table as a string using Parser.
  var url = "https://stooq.com/q/d/?s=barc.uk&i=d";
  var fromText = '#d9d9d9}</style>';
  var toText = '<table';
  var content = UrlFetchApp.fetch(url).getContentText();
  var scraped = Parser.data(content).from(fromText).to(toText).build();

  // Modify values
  scraped = scraped.replace(/=([a-zA-Z0-9\%-:]+)/g, "=\"$1\"").replace(/nowrap/g, "");

  // Parse table using XmlService.
  var root = XmlService.parse(scraped).getRootElement();

  // Retrieve header and modify it.
  var headerTr = root.getChild("thead").getChildren();
  var res = headerTr.map(function(e) {return e.getChildren().map(function(f) {return f.getValue()})});
  res[0].splice(7, 0, "Change");

  // Retrieve values.
  var valuesTr = root.getChild("tbody").getChildren();
  var values = valuesTr.map(function(e) {return e.getChildren().map(function(f) {return f.getValue()})});
  Array.prototype.push.apply(res, values);

  // Put the result to the active spreadsheet.
  var ss = SpreadsheetApp.getActiveSheet();
  ss.getRange(1, 1, res.length, res[0].length).setValues(res);
}

注意:

  • 在运行此修改后的脚本之前,请安装Parser的GAS库.
  • 此修改后的脚本与各种URL不对应.可以将其用于您问题中的网址.如果要从其他URL检索值,请修改脚本.
  • Note:

    • Before you run this modified script, please install the GAS library of Parser.
    • This modified script is not corresponding to various URL. This can be used for the URL in your question. If you want to retrieve values from other URL, please modify the script.
      • Parser
      • XmlService

      如果这不是您想要的,对不起.

      If this was not what you want, I'm sorry.

      这篇关于第1行发生错误:序言中不允许包含内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-27 23:01
查看更多