本文介绍了在 HTML 元素中防止 XSS的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是否足以防止来自 HTML 元素内部的 XSS?

Is the following enough to prevent XSS from inside HTML elements?

function XSS_encode_html ( $str )
{
    $str = str_replace ( '&', "&", $str );
    $str = str_replace ( '<', "&lt;", $str );
    $str = str_replace ( '>', "&gt;", $str );
    $str = str_replace ( '"', " &quot;", $str );
    $str = str_replace ( '\'', " &#x27;", $str );
    $str = str_replace ( '/', "&#x2F;", $str );

    return $str;
}

正如这里提到的:-
https://www.owasp.org/index.php/Abridgevention_Sheet_Cheat_.231_-_HTML_Escape_Before_Inserting_Untrusted_Data_into_HTML_Element_Content

编辑

我没有使用 htmlspecialchars() 因为:-

I'm not using htmlspecialchars() because: -

  1. 它不会将/更改为 &#x2F;
  2. '(单引号)在设置 ENT_QUOTES 时变为 '&#039;'(或 &apos;).
  1. It does not change / to &#x2F;
  2. ' (single quote) becomes '&#039;' (or &apos;) when ENT_QUOTES is set.

根据 OWASP,'(单引号)应该变成 &#x27;(请叫我迂腐)并且,
&apos; 不推荐,因为它不在 HTML 规范中

According to OWASP, ' (single quote) should become &#x27; (call me pedantic) and,
&apos; not recommended because its not in the HTML spec

推荐答案

在元素的内容中,唯一可能有害的字符是开始标记分隔符 < 因为它可能表示某些标记声明的开始,无论它是否是开始标记、结束标记或注释.所以那个字符应该总是被转义.

Inside the content of an element, the only character that can be harmful is the start-tag delimiter < as it may denote the start of some markup declaration, whether it’s a start tag, an end tag, or a comment. So that character should always be escaped.

其他字符不一定需要在元素内容中转义.

The other characters do not necessarily need to be escaped inside the content of an element.

引号只需要在标签内转义,特别是当用于包含在相同引号中或根本不引用的属性值时.类似地,标记声明关闭分隔符 > 只需要在标签内转义,这里仅当在未加引号的属性值中使用时.但是,建议也转义普通的&符号,以避免它们被错误地解释为字符引用的开头.

The quotes do only need to be escaped inside tags, especially when used for attribute values that are either wrapped within the same quotes or not quoted at all. Similarly, the markup declaration close delimiter > does only need to be escaped inside the tags, here only when used in a unquoted attribute value. However, escaping plain ampersands as well is recommended to avoid them being interpreted as start of a character reference by mistake.

现在至于替换 / 的原因,可能是由于 SGML 中的一个特性,标记语言 HTML 改编自,这允许所谓的 空结束标签:

Now as for the the reason to replace / as well, it may either be due to a feature in SGML, the markup language HTML is adapted from, which allowed so called null end-tag:

要了解空结束标记在实践中的工作原理,请考虑将其与可定义为的元素结合使用:

<!ELEMENT ISBN  - -  CDATA --ISBN number-- >

而不是输入 ISBN 号:

Instead of entering an ISBN number as:

<ISBN>0 201 17535 5</ISBN>

我们可以使用 null end-tag 选项以缩短形式输入元素:

we can use the null end-tag option to enter the element in the shortened form:

<ISBN/0 201 17535 5/

但是,我从未见过任何浏览器实现了此功能.HTML 的语法规则一直比 SGML 语法规则更严格.

However, I’ve never seen this feature ever been implemented by any browser. HTML’s syntax rules has always been more strict than SGML syntax rules.

另一个更可能的原因是所谓的 原始文本元素(scriptstyle),它是带有以下 限制:

Another, more probable reason is the content model of so called raw text elements (script and style), which is plain text with the following restriction:

原始文本和 RCDATA 元素中的文本不得包含任何出现的字符串 "</" (U+003C LESS-THAN SIGN, U+002F SOLIDUS) 后跟那种情况的字符- 不敏感地匹配元素的标签名称,后跟tab"(U+0009)、LF"(U+000A)、FF"(U+000C)、CR"(U+000D)、U之一+0020 空格、>"(U+003E)或/"(U+002F).

这里说在诸如 script 之类的原始文本元素中,出现 </script/ 将表示结束标记:

Here it says that inside raw text elements such as script an occurrence of </script/ would denote the end tag:

<script>
alert(0</script/.exec("script").index)
</script>

尽管完全有效的 JavaScript 代码,结束标记将由 </script/ 表示.但除此之外,/ 不会有任何危害.如果您只允许在 JavaScript 上下文中使用任意输入来转义 HTML,那您就已经注定失败了.

Although perfectly valid JavaScript code, the end tag would be denoted by </script/. But besides that, the / does not prone any harm. And if you would allow arbitrary input being used in a JavaScript context only with escaping HTML, you’d be already doomed.

顺便说一句,什么样的字符引用 这些字符被转义,无论是命名字符引用(即实体引用)还是数字字符引用,无论是十进制还是十六进制表示法.它们都引用了相同的字符.

By the way, it doesn’t matter with what kind of character reference these characters are escaped, whether it’s named character references (i.e. entity references), or numeric character references, either in decimal or hexadecimal notation. They all reference the same characters.

这篇关于在 HTML 元素中防止 XSS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-24 19:04
查看更多