分裂HTML code标签和内容

本文介绍了分裂HTML code标签和内容的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有没有人有比我有关正前pressions更多的知识知道如何分割HTML code，使所有代码和所有字分开即

 ＆LT; P＆GT;有些内容＆LT; A HREF =www.test.com＆gt;将链接＆LT; / A＆GT;＆LT; / P＆GT;

时的分隔是这样的：

  =阵{[0] =＆gt;中＆LT; P＆gt;中，
          [1] =＆gt;中的一些，
          [2] =＆gt;中的内容，
          [3] =＆gt;中与所述; A HREF =www.test.com'＆gt;中
          [4] =＆gt;中A
          [5] =＆gt;中通，
          [6] =＆gt;中与所述; / A＆gt;中，
          [7] =＆gt;中与所述; / P＆gt;中

我一直在使用preg_split到目前为止，并有可能成功，也成功地分裂用空白字符串或标签分割 - 但所有的内容是一个数组元素，当我EED这是分裂

任何人都帮我吗？

解决方案

preg_split不应在这种情况下使用。尝试preg_match_all：

  $文字='＆LT; P＆GT;有些内容＆LT; A HREF =www.test.com＆gt;将链接＆LT; / A＆GT;＆LT; / P＆GT;';
preg_match_all（'/＆LT; ^＆GT;] ++盐| [^＆LT;＆GT; \\ S] ++ /'，$文字$令牌）;
的print_r（$令牌）;

输出：

 阵列
（
    [0] =＆GT;排列
        （
            [0] =＆GT; ＆所述p为H.;
            [1] =＆GT;一些
            [2] =＆GT;内容
            [3] =＆GT; ＆所述; A HREF =www.test.com＆GT;
            [4] =＆GT;一个
            [5] =＆GT;链接
            [6] =＆GT; ＆所述; / A＆GT;
            [7] =＆GT; ＆所述; / P＆GT;
        ））

我以为你忘了包括'A'在'链接'在你的例子。

意识到，当你的HTML中包含＆LT;或>的并不是作为开始或结束的标签，正则表达式会搞乱的东西涨得厉害！（因此警告）

Does anyone with more knowledge than me about regular expressions know how to split up html code so that all tags and all words are seperated ie.

<p>Some content <a href="www.test.com">A link</a></p>

Is seperated like this:

array = { [0]=>"<p>",
          [1]=>"Some",
          [2]=>"content",
          [3]=>"<a href='www.test.com'>,
          [4]=>"A",
          [5]=>"Link",
          [6]=>"</a>",
          [7]=>"</p>"

I've been using preg_split so far and have either successfully managed to split the string by whitespace or split by tags - but then all the content is in one array element when I eed this to be split to.

Anyone help me out?

解决方案

preg_split shouldn't be used in that case. Try preg_match_all:

$text = '<p>Some content <a href="www.test.com">A link</a></p>';
preg_match_all('/<[^>]++>|[^<>\s]++/', $text, $tokens);
print_r($tokens);

output:

Array
(
    [0] => Array
        (
            [0] => <p>
            [1] => Some
            [2] => content
            [3] => <a href="www.test.com">
            [4] => A
            [5] => link
            [6] => </a>
            [7] => </p>
        )

)

I assume you forgot to include the 'A' in 'A link' in your example.

Realize that when your HTML contains < or >'s not meant as the start or end of tags, regex will mess things up badly! (hence the warnings)

这篇关于分裂HTML code标签和内容的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！