我正在尝试构建网页内容的字符串,而没有HTML语法(可能将其替换为空格,因此单词并非全部结合在一起)或标点符号。

所以说你有代码:

    <body>
    <h1>Content:</h1>
    <p>paragraph 1</p>
    <p>paragraph 2</p>

    <script> alert("blah blah blah"); </script>

    This is some text<br />
    ....and some more
    </body>


我想返回字符串:

    var content = "Content paragraph 1 paragraph 2 this is some text and this is some more";


任何想法如何做到这一点?谢谢。

最佳答案

某些浏览器支持W3C DOM 3核心textContent属性,而其他浏览器则支持MS / HTML5 innerText属性(两者都支持)。脚本元素的内容可能是不需要的,因此最好遍历DOM树的相关部分:

// Get the text within an element
// Doesn't do any normalising, returns a string
// of text as found.
function getTextRecursive(element) {
  var text = [];
  var self = arguments.callee;
  var el, els = element.childNodes;

  for (var i=0, iLen=els.length; i<iLen; i++) {
    el = els[i];

    // May need to add other node types here
    // Exclude script element content
    if (el.nodeType == 1 && el.tagName && el.tagName.toLowerCase() != 'script') {
      text.push(self(el));

    // If working with XML, add nodeType 4 to get text from CDATA nodes
    } else if (el.nodeType == 3) {

      // Deal with extra whitespace and returns in text here.
      text.push(el.data);
    }
  }
  return text.join('');
}

关于javascript - 来自document.body.innerHTML的javascript HTML,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/6687141/

10-09 19:50