我正在尝试从已抓取的网站中剪切出一些文本,并且不确定可以使用哪些函数或库来简化此操作:

我从PhantomJS运行的代码示例:

var latest_release = page.evaluate(function () {
                // everything inside this function is executed inside our
                // headless browser, not PhantomJS.
                var links = $('[class="interesting"]');
                var releases = {};
                for (var i=0; i<links.length; i++) {
                    releases[links[i].innerHTML] = links[i].getAttribute("href");
                }

                // its important to take note that page.evaluate needs
                // to return simple object, meaning DOM elements won't work.
                return JSON.stringify(releases);
            });


interesting具有我所需要的,并由新行和制表符之类的东西包围。

这里是:

{"\n\t\t\t\n\t\t\t\tI_Am_Interesting\n\t\t\t\n\t\t":null,"\n\t\t\t\n\t\t\t\tI_Am_Interesting\n\t\t\t\n\t\t":null,"\n\t\t\t\n\t\t\t\tI_Am_Interesting\n\t\t\t\n\t\t":null}


我尝试了string.slice("\n");并没有任何反应,我真的想要一种有效的方法,能够根据它与那些\n'\t的关系来切出这样的字符串

顺便说一下,这是我的拆分代码:

var x = latest_release.split('\n');


干杯。

最佳答案

    var interesting = {
        "\n\t\t\t\n\t\t\t\tI_Am_Interesting1\n\t\t\t\n\t\t":null,
        "\n\t\t\t\n\t\t\t\tI_Am_Interesting2\n\t\t\t\n\t\t":null,
        "\n\t\t\t\n\t\t\t\tI_Am_Interesting3\n\t\t\t\n\t\t":null
    }

    found = new Array();
    for(x in interesting) {
        found[found.length] = x.match(/\w+/g);
    }
    alert(found);

10-06 11:40