我正在尝试构建网页内容的字符串,而没有HTML语法(可能将其替换为空格,因此单词并非全部结合在一起)或标点符号。
所以说你有代码:
<body>
<h1>Content:</h1>
<p>paragraph 1</p>
<p>paragraph 2</p>
<script> alert("blah blah blah"); </script>
This is some text<br />
....and some more
</body>
我想返回字符串:
var content = "Content paragraph 1 paragraph 2 this is some text and this is some more";
任何想法如何做到这一点?谢谢。
最佳答案
某些浏览器支持W3C DOM 3核心textContent属性,而其他浏览器则支持MS / HTML5 innerText属性(两者都支持)。脚本元素的内容可能是不需要的,因此最好遍历DOM树的相关部分:
// Get the text within an element
// Doesn't do any normalising, returns a string
// of text as found.
function getTextRecursive(element) {
var text = [];
var self = arguments.callee;
var el, els = element.childNodes;
for (var i=0, iLen=els.length; i<iLen; i++) {
el = els[i];
// May need to add other node types here
// Exclude script element content
if (el.nodeType == 1 && el.tagName && el.tagName.toLowerCase() != 'script') {
text.push(self(el));
// If working with XML, add nodeType 4 to get text from CDATA nodes
} else if (el.nodeType == 3) {
// Deal with extra whitespace and returns in text here.
text.push(el.data);
}
}
return text.join('');
}
关于javascript - 来自document.body.innerHTML的javascript HTML,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/6687141/