我正在尝试从已抓取的网站中剪切出一些文本,并且不确定可以使用哪些函数或库来简化此操作:
我从PhantomJS运行的代码示例:
var latest_release = page.evaluate(function () {
// everything inside this function is executed inside our
// headless browser, not PhantomJS.
var links = $('[class="interesting"]');
var releases = {};
for (var i=0; i<links.length; i++) {
releases[links[i].innerHTML] = links[i].getAttribute("href");
}
// its important to take note that page.evaluate needs
// to return simple object, meaning DOM elements won't work.
return JSON.stringify(releases);
});
类
interesting
具有我所需要的,并由新行和制表符之类的东西包围。这里是:
{"\n\t\t\t\n\t\t\t\tI_Am_Interesting\n\t\t\t\n\t\t":null,"\n\t\t\t\n\t\t\t\tI_Am_Interesting\n\t\t\t\n\t\t":null,"\n\t\t\t\n\t\t\t\tI_Am_Interesting\n\t\t\t\n\t\t":null}
我尝试了
string.slice("\n");
并没有任何反应,我真的想要一种有效的方法,能够根据它与那些\n'
和\t
的关系来切出这样的字符串顺便说一下,这是我的拆分代码:
var x = latest_release.split('\n');
干杯。
最佳答案
var interesting = {
"\n\t\t\t\n\t\t\t\tI_Am_Interesting1\n\t\t\t\n\t\t":null,
"\n\t\t\t\n\t\t\t\tI_Am_Interesting2\n\t\t\t\n\t\t":null,
"\n\t\t\t\n\t\t\t\tI_Am_Interesting3\n\t\t\t\n\t\t":null
}
found = new Array();
for(x in interesting) {
found[found.length] = x.match(/\w+/g);
}
alert(found);