问题描述
我正在寻找有关在< a>
标记的 href
属性中处理HTML实体的规范.到目前为止,还没有运气(我可能正在寻找过于具体的东西).
详细信息:
我尝试修复的错误="https://github.com/MatthewMueller/cheerio" rel ="nofollow noreferrer"> cheerio 项目.
某些实体最后不需要分号.其中之一是& curren
.无论如何,当源链接到/test/example.jsp?item=123¤tSize=S¤tQty=1
时,这会导致问题.
浏览器(至少是Chrome)很好地处理.我仍然没有弄清楚为什么.
关于HTML直至HTML 4.01(包括HTML 4.01),请参阅@Quentin的答案.
关于XHTML的任何形式,包括XHTML序列化中的HTML5,& currentSize =
包含格式正确的错误,因此该文档的任何显示都将中止(当该文档作为真正的XHTML处理时)).
在HTML序列化的HTML5中,对于解析字符引用.他们暗示,在文本内容中, 具体来说,这里描述的条件是:如果字符引用作为属性的一部分被使用,并且最后匹配的字符不是;".(U + 003B)字符,下一个字符要么是"="(U + 003D)字符,要么在ASCII数字,大写ASCII字母或小写ASCII字母范围内,然后由于历史原因,所有在U + 0026 AMPERSAND字符(&)之后匹配的字符必须未使用,并且不会返回任何内容."因此,即使 原因是作者在属性值中编写了广泛的URL,而没有转义 I'm looking for a spec on handling HTML entities in the The bug I'm trying to fix is part of the cheerio project. Some entities don't require a semicolon at the end. One of them is Browsers (at least Chrome) handle this nicely. I still haven't figured out why though. Regarding HTML up to and including HTML 4.01, see @Quentin’s answer. Regarding any flavor of XHTML, including HTML5 in XHTML serialization, In HTML5 in HTML serialization, there are tricky ad hoc rules for parsing character references. They imply that in text content, Specifically, the conditions described there are: "If the character reference is being consumed as part of an attribute, and the last character matched is not a ";" (U+003B) character, and the next character is either a "=" (U+003D) character or in the range ASCII digits, uppercase ASCII letters, or lowercase ASCII letters, then, for historical reasons, all the characters that were matched after the U+0026 AMPERSAND character (&) must be unconsumed, and nothing is returned." So no The reason is that authors have widely written URLs in attribute values without escaping 这篇关于在[href]中处理HTML实体的规范的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!& currentSize =
将被解析为好像被编写为& curr; entSize =
,即为¤entSize= 代码>.但是在属性值内(如
< a href ="...">
中所示),在某些情况下,由于该引用未以分号终止,因此无法识别该引用./p> foobar
是已定义的名称& foobar =
&
,而浏览器已经对此进行了调整.href
attribute of <a>
tags. So far, no luck (I might be searching for something too specific).In detail:
¤
. Anyway, this leads to problems when a source links to /test/example.jsp?item=123¤tSize=S¤tQty=1
.¤tSize=
contains a well-formedness error, so any display of the document is aborted (when the document is processed as truly XHTML).¤tSize=
would be parsed as if it were written &curr;entSize=
, i.e. as ¤entSize=
. But within an attribute value, as in <a href="...">
, then, under certain conditions, the reference is not recognized, since it is not terminated by a semicolon.&foobar=
will be recognized in an attribute value, even if foobar
is a defined name&
and browsers have adapted to this.