获取HTML内容的第100个字符不剥标签

获取HTML内容的第100个字符不剥标签

本文介绍了获取HTML内容的第100个字符不剥标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有很多问题,如何剥离HTML标记,但不是很多的函数/方法来关闭它们。

这里的情况。我有一个500个字符的信息汇总(其中包括HTML标签),但我只想前100个字符。问题是,如果我截短消息,也可能是在一个HTML标签的中间......这弄乱的东西。

Here's the situation. I have a 500 character Message summary ( which includes html tags ), but I only want the first 100 characters. Problem is if I truncate the message, it could be in the middle of an html tag... which messes up stuff.

假设HTML是这样的:

Assuming the html is something like this:

<div class="bd">"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. <br/>
 <br/>Some Dates: April 30 - May 2, 2010 <br/>
 <p>Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. <em>Duis aute irure dolor in reprehenderit</em> in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. <br/>
 </p>
 For more information about Lorem Ipsum doemdloe, visit: <br/>
 <a href="http://www.somesite.com" title="Some Conference">Some text link</a><br/>
</div>

我怎么会拿第一〜100个字符左右? (虽然,理想情况下,将是内容的第一个大约100个​​字符(在HTML标记之间)

How would I take the first ~100 characters or so? ( Although, ideally that would be the first approximately 100 characters of "CONTENT" ( in between the html tags )

我假设这样做将是一个递归算法,用于跟踪的HTML标签,并追加了将被截断任何标签的最佳方式,但未必是最好的办法。

I'm assuming the best way to do this would be a recursive algorithm that keeps track of the html tags and appends any tags that would be truncated, but that may not be the best approach.

我的第一个想法是使用递归来算嵌套的标签,当我们到达100个字符,寻找下一个&LT;然后使用递归编写从那里所需的结束HTML标记。

My first thoughts are using recursion to count nested tags, and when we reach 100 characters, look for the next "<" and then use recursion to write the closing html tags needed from there.

这样做的原因是使现有的物品的简短摘要,而无需用户回去为所有的文章提供摘要。我想保持HTML格式,如果可能的话。

The reason for doing this is to make a short summary of existing articles without requiring the user to go back and provide summaries for all the articles. I want to keep the html formatting, if possible.

注:请忽略的HTML是不完全的语义。这是我不得不面对从我的所见即所得。

我增加了一个潜在的解决方案(即似乎工作)我估计别人会遇到这个问题为好。我不知道这是最好的......而且它可能不是完全健壮(其实,我知道这是不是),但我AP preciate任何反馈

推荐答案

我的建议是找到一个HTML友好遍历器(一个可以让你遍历HTML像XML),然后从开始标签开始忽略的标签本身,只在计算变量的数据。计数对你的限制,然后一度达到刚刚收​​出每个标签(我不能认为这是不是任何标签/无论作为标记)。

My suggestion would be to find a HTML friendly traverser (one that lets you traverse HTML like XML) and then starting from the beginning tags ignore the tags themselves and only count the data in the tag. Count that towards your limit and then once reached just close out each tag (I cant think of any tags that are not just /whatever as the tag).

这应该工作相当不错,相当接近你在找什么。

This should work reasonably well and be fairly close to what you are looking for.

其完全关闭ol'noggin的顶部,这样我假设会有一些棘手的部分,如属性值的显示(如链路标记值)。

Its totally off the top of the ol'noggin so I am assuming that there will be some tricky parts, like attribute values that display (such as link tag values).

这篇关于获取HTML内容的第100个字符不剥标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 06:55