所以,我对这个谷歌应用程序脚本有点左右为难。习惯了传统的Javascript这是一个很大的挑战。我目前正在尝试从Zillow中提取价值,我在前两个项目(租金价值、Zestimate、学校评级)上取得了成功,但现在我需要得到学校名称。这是一个非常麻烦的事情,我真的陷入困境,我似乎不能做一个.match()什么我需要得到。我会发布一些代码,看看是否有其他人能掌握这一点。
我正在分析的Zillow代码:

<ul class="nearby-schools-list">
<li class="nearby-schools-header">
    <h4 class="nearby-schools-rating">&nbsp;</h4>
    <h4 class="nearby-schools-name">&nbsp;</h4>
    <h4 class="nearby-schools-grades">Grades</h4>
    <h4 class="nearby-schools-distance">Distance</h4>
</li>
<li class="nearby-school assigned-school">
    <span class="gs-rating-badge">
        <div class="gs-rating gs-rating-8">
            <span class="gs-rating-number">8</span>
            <span class="gs-rating-subtext">out of 10</span>
        </div>
    </span>
    <span class="nearby-schools-name"> <a href="/seattle-wa/schools/salmon-bay-school-93956/" class="ga-tracked-link track-ga-event school-name notranslate" data-ga-action="School details click" data-ga-label="HDP AB Module" data-ga-category="Homes" data-ga-standard-href="true">Salmon Bay School</a>
        <span class="assigned-label de-emph">(assigned)</span>
    </span>
    <span class="nearby-schools-grades">K-8</span>
    <span class="nearby-schools-distance">0.3 mi</span>
</li>
<li class="nearby-school assigned-school">
    <span class="gs-rating-badge">
        <div class="gs-rating gs-rating-8">
            <span class="gs-rating-number">8</span>
            <span class="gs-rating-subtext">out of 10</span>
        </div>
    </span>
    <span class="nearby-schools-name"> <a href="/seattle-wa/schools/whitman-middle-school-93939/" class="ga-tracked-link track-ga-event school-name notranslate" data-ga-action="School details click" data-ga-label="HDP AB Module" data-ga-category="Homes" data-ga-standard-href="true">Whitman Middle</a>
        <span class="assigned-label de-emph">(assigned)</span>
    </span>
    <span class="nearby-schools-grades">6-8</span>
    <span class="nearby-schools-distance">1.4 mi</span>
</li>
<li class="nearby-school assigned-school">
    <span class="gs-rating-badge">
        <div class="gs-rating gs-rating-9">
            <span class="gs-rating-number">9</span>
            <span class="gs-rating-subtext">out of 10</span>
        </div>
    </span>
    <span class="nearby-schools-name"> <a href="/seattle-wa/schools/ballard-high-school-92363/" class="ga-tracked-link track-ga-event school-name notranslate" data-ga-action="School details click" data-ga-label="HDP AB Module" data-ga-category="Homes" data-ga-standard-href="true">Ballard High</a>
        <span class="assigned-label de-emph">(assigned)</span>
    </span>
    <span class="nearby-schools-grades">9-12</span>
    <span class="nearby-schools-distance">0.2 mi</span>
</li>

这是一个很大的块,但本质上我试图从school-name中获取文本,这是一个列在ul > li > span.nearby-schools-name > a.school-name下的类。
这是我的尝试,我做的任何事情都会让我感到茫然。
// get School Names
var match = contentText.match(/<a href="([^<]*)" class="ga-tracked-link track-ga-event school-name notranslate" /g);
Browser.msgBox(match);
var schoolNameArray = new Array();

while (match.length > 0) {
    var thisSchoolName = new String(schoolName.pop());
    Browser.msgBox(thisSchoolName);
    //schoolNameArray.push(thisSchoolName);
}

var schoolNames = schoolNameArray.toString().replace(/,/g, " _ ");

作为一个快速的常见问题解答,我尝试了web上复制getElementsByClassName的功能,但没有成功。我也试着抓住

最佳答案

这是一种方法。首先按类名获取所有元素:

var elSchoolNames = document.getElementsByClassName("nearby-schools-name");

返回的是一个对象。如果将变量elSchoolNames显示到控制台,console.log('elSchoolNames: ' + elSchoolNames );将如下所示:
[object HTMLCollection]

在对象内部[object HTMLCollection]是一组更多的对象;一组对象。
[object HTMLHeadingElement]
[object HTMLSpanElement]
[object HTMLSpanElement]
[object HTMLSpanElement]

理解对象有key:value对很重要,但也有一个对象数组,没有键(属性)若要从主对象中获取子对象,请按编号引用它们,因为它们没有属性名,因为它是该级别的数组。
你需要所有的跨度元素。
var theSpanEl = elSchoolNames[1];
var theSpanE2 = elSchoolNames[2];
var theSpanE3 = elSchoolNames[3];

console.log('textContent: ' + theSpanEl.textContent);

学校的名称在对象的textContent属性中。
如何知道第一个对象中的所有对象以及第一个Span元素的内容?我循环浏览了对象的所有属性。
var elSchoolNames = document.getElementsByClassName("nearby-schools-name");
console.log('namesOfSchools: ' + elSchoolNames);

for (theProperty in elSchoolNames) {
    console.log('theProperties: ' + theProperty);
    console.log('each value: ' + elSchoolNames[theProperty]);
};

var theSpanEl = elSchoolNames[1];

for (spanProperty in theSpanEl) {
    console.log('theProperties: ' + spanProperty);
    console.log('each value: ' + theSpanEl[spanProperty]);
};

console.log('textContent: ' + theSpanEl.textContent);

要得到子元素,需要在第一个元素之后去掉每个元素。因为它是零索引的,所以第二个元素是1。
var theSpanEl = elSchoolNames[1];

现在,要查看您拥有的内容,请将其打印到控制台:
console.log('textContent: ' + theSpanEl.textContent);

这给了你:
textContent:  Salmon Bay School
    (assigned)

当然,您需要用string方法去掉末尾的(assigned)。您不需要使用.match()或regEx来完成任何操作。
我刚刚意识到,如果你从一个不属于你的网站中获取HTML内容,而HTML内容是一个字符串,那么这些都不起作用。除非你用innerHTML将HTML注入到你的站点中,然后使用上面的代码。

关于javascript - 通过getContext从HTML标签中提取文本-Google Apps脚本-电子表格,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/24539931/

10-10 06:52