问题描述
对于HTML输入,我要中和所有具有内联js的HTML元素(onclick ="..",onmouseout =".."等).我在想,对下面的字符进行编码还不够吗? =,(,)
For HTML input, I want to neutralize all HTML elements that have inline js (onclick="..", onmouseout=".." etc).I am thinking, isn't it enough to encode the following chars? =,(,)
所以onclick ="location.href ='ggg.com'"
会变成onclick%3D"location.href%3D'ggg.com'"
So onclick="location.href='ggg.com'"
will becomeonclick%3D"location.href%3D'ggg.com'"
我在这里想念什么?
我确实需要接受活动的HTML(我无法全部或全部转义它).
I do need to accept active HTML (I can't escape it all or entities is it).
推荐答案
没有简单的方法可以接受HTML,但不能接受脚本.
There's no simple method to accept HTML, but not scripts.
您必须将HTML解析为DOM,删除DOM中所有不需要的元素和属性,并生成新的HTML.
You have to parse HTML to DOM, remove all unwanted elements and attributes in DOM and generate new HTML.
It can't be done reliably with regular expressions.
on
*属性是不够的.脚本可以嵌入在style
,src
,href
和其他属性中.
on
* attributes are not enough. Scripts can be embedded in style
, src
, href
and other attributes.
如果您使用的是PHP,请使用 HTML净化器.
If you're using PHP, then use HTML Purifier.
这篇关于清除HTML标记中的所有内联事件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!