所以,我对这个谷歌应用程序脚本有点左右为难。习惯了传统的Javascript这是一个很大的挑战。我目前正在尝试从Zillow中提取价值,我在前两个项目(租金价值、Zestimate、学校评级)上取得了成功,但现在我需要得到学校名称。这是一个非常麻烦的事情,我真的陷入困境,我似乎不能做一个.match()
什么我需要得到。我会发布一些代码,看看是否有其他人能掌握这一点。
我正在分析的Zillow代码:
<ul class="nearby-schools-list">
<li class="nearby-schools-header">
<h4 class="nearby-schools-rating"> </h4>
<h4 class="nearby-schools-name"> </h4>
<h4 class="nearby-schools-grades">Grades</h4>
<h4 class="nearby-schools-distance">Distance</h4>
</li>
<li class="nearby-school assigned-school">
<span class="gs-rating-badge">
<div class="gs-rating gs-rating-8">
<span class="gs-rating-number">8</span>
<span class="gs-rating-subtext">out of 10</span>
</div>
</span>
<span class="nearby-schools-name"> <a href="/seattle-wa/schools/salmon-bay-school-93956/" class="ga-tracked-link track-ga-event school-name notranslate" data-ga-action="School details click" data-ga-label="HDP AB Module" data-ga-category="Homes" data-ga-standard-href="true">Salmon Bay School</a>
<span class="assigned-label de-emph">(assigned)</span>
</span>
<span class="nearby-schools-grades">K-8</span>
<span class="nearby-schools-distance">0.3 mi</span>
</li>
<li class="nearby-school assigned-school">
<span class="gs-rating-badge">
<div class="gs-rating gs-rating-8">
<span class="gs-rating-number">8</span>
<span class="gs-rating-subtext">out of 10</span>
</div>
</span>
<span class="nearby-schools-name"> <a href="/seattle-wa/schools/whitman-middle-school-93939/" class="ga-tracked-link track-ga-event school-name notranslate" data-ga-action="School details click" data-ga-label="HDP AB Module" data-ga-category="Homes" data-ga-standard-href="true">Whitman Middle</a>
<span class="assigned-label de-emph">(assigned)</span>
</span>
<span class="nearby-schools-grades">6-8</span>
<span class="nearby-schools-distance">1.4 mi</span>
</li>
<li class="nearby-school assigned-school">
<span class="gs-rating-badge">
<div class="gs-rating gs-rating-9">
<span class="gs-rating-number">9</span>
<span class="gs-rating-subtext">out of 10</span>
</div>
</span>
<span class="nearby-schools-name"> <a href="/seattle-wa/schools/ballard-high-school-92363/" class="ga-tracked-link track-ga-event school-name notranslate" data-ga-action="School details click" data-ga-label="HDP AB Module" data-ga-category="Homes" data-ga-standard-href="true">Ballard High</a>
<span class="assigned-label de-emph">(assigned)</span>
</span>
<span class="nearby-schools-grades">9-12</span>
<span class="nearby-schools-distance">0.2 mi</span>
</li>
这是一个很大的块,但本质上我试图从
school-name
中获取文本,这是一个列在ul > li > span.nearby-schools-name > a.school-name
下的类。这是我的尝试,我做的任何事情都会让我感到茫然。
// get School Names
var match = contentText.match(/<a href="([^<]*)" class="ga-tracked-link track-ga-event school-name notranslate" /g);
Browser.msgBox(match);
var schoolNameArray = new Array();
while (match.length > 0) {
var thisSchoolName = new String(schoolName.pop());
Browser.msgBox(thisSchoolName);
//schoolNameArray.push(thisSchoolName);
}
var schoolNames = schoolNameArray.toString().replace(/,/g, " _ ");
作为一个快速的常见问题解答,我尝试了web上复制
getElementsByClassName
的功能,但没有成功。我也试着抓住 最佳答案
这是一种方法。首先按类名获取所有元素:
var elSchoolNames = document.getElementsByClassName("nearby-schools-name");
返回的是一个对象。如果将变量
elSchoolNames
显示到控制台,console.log('elSchoolNames: ' + elSchoolNames );
将如下所示:[object HTMLCollection]
在对象内部
[object HTMLCollection]
是一组更多的对象;一组对象。[object HTMLHeadingElement]
[object HTMLSpanElement]
[object HTMLSpanElement]
[object HTMLSpanElement]
理解对象有
key:value
对很重要,但也有一个对象数组,没有键(属性)若要从主对象中获取子对象,请按编号引用它们,因为它们没有属性名,因为它是该级别的数组。你需要所有的跨度元素。
var theSpanEl = elSchoolNames[1];
var theSpanE2 = elSchoolNames[2];
var theSpanE3 = elSchoolNames[3];
console.log('textContent: ' + theSpanEl.textContent);
学校的名称在对象的
textContent
属性中。如何知道第一个对象中的所有对象以及第一个Span元素的内容?我循环浏览了对象的所有属性。
var elSchoolNames = document.getElementsByClassName("nearby-schools-name");
console.log('namesOfSchools: ' + elSchoolNames);
for (theProperty in elSchoolNames) {
console.log('theProperties: ' + theProperty);
console.log('each value: ' + elSchoolNames[theProperty]);
};
var theSpanEl = elSchoolNames[1];
for (spanProperty in theSpanEl) {
console.log('theProperties: ' + spanProperty);
console.log('each value: ' + theSpanEl[spanProperty]);
};
console.log('textContent: ' + theSpanEl.textContent);
要得到子元素,需要在第一个元素之后去掉每个元素。因为它是零索引的,所以第二个元素是1。
var theSpanEl = elSchoolNames[1];
现在,要查看您拥有的内容,请将其打印到控制台:
console.log('textContent: ' + theSpanEl.textContent);
这给了你:
textContent: Salmon Bay School
(assigned)
当然,您需要用string方法去掉末尾的
(assigned)
。您不需要使用.match()
或regEx来完成任何操作。我刚刚意识到,如果你从一个不属于你的网站中获取HTML内容,而HTML内容是一个字符串,那么这些都不起作用。除非你用innerHTML将HTML注入到你的站点中,然后使用上面的代码。
关于javascript - 通过getContext从HTML标签中提取文本-Google Apps脚本-电子表格,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/24539931/