本文介绍了HTML净化器:根据条件属性有条件地去除元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据,格式不正确的URI偶尔会被丢弃到留下无属性的锚标记,例如

< a href =javascript:document.location ='http:// www。 google.com /'> XSS< / a> 变为< a> XSS< / a>< / code& b
...以及偶尔被剥离到协议,例如 $ b

< a href = http:// 1113982867 /> XSS< / a> 变为< a href =http:/> XSS< / a>



虽然这没有问题,但本质上它有点难看。我希望能够使用HTML Purifier自己的库功能/注入器/插件/ whathaveyou,而不是试图用正则表达式去掉这些。

参考点:处理属性



有条件地移除HTMLPurifier中的属性非常简单。这里库提供了类 HTMLPurifier_AttrTransform ,方法为 confiscateAttr()

>

虽然我个人不使用 confiscateAttr() 的功能,但我确实使用了 HTMLPurifier_AttrTransform 按,我现在正在使用一个简单的解决方案:

  //有点上下文
$ htmlDef = $ this->配置 - > ; getHTMLDefinition(真);
$ anchor = $ htmlDef-> addBlankElement('a');

// HTMLPurifier_AttrTransform_RemoveLoneHttp strips'href =http:/'from
//所有定位标记(请参阅类详细信息的第一篇文章)
$ anchor-> attr_transform_post [ ] = new HTMLPurifier_AttrTransform_RemoveLoneHttp();

//这是魔术!我们使'href'成为一个必需的属性(注意
//星号) - 现在HTML Purifier会移除< a>< / a>以及
//< a href = HTTP:/ >< / A>在HTMLPurifier_AttrTransform_RemoveLoneHttp
// //完成之后!
$ htmlDef-> addAttribute('a','href *',new HTMLPurifier_AttrDef_URI());

它可以工作,它可以工作,bahahahaHAHAHAHAnhͥͤͫğͮ͑̆ͦó̈͐̈hͧ̆̈̉ğ̈͐̈a̾̈̑ͨ̾̈̑ͨ̔̄̑̇ḡh̘̝͊̐ͩͥ̋ͤ͛g̦̣̙̙̒ͥ̐̔o̤̣hg͓̈͋̇̓̆ä͖̩̯̥͕̐ͮ̒o̶ͬ̽̍ͮ̾ͮ͢҉̩͉̘͓̙̦̩̹͍̹̠̕g̵̡͔̙͉̠̙̩͚͑ͥ̓͛̋͗̍̽͋͑̈̚... *狂躁的笑声,潺潺的声音,脸上带着微笑的龙骨*


As per the HTML Purifier smoketest, 'malformed' URIs are occasionally discarded to leave behind an attribute-less anchor tag, e.g.

<a href="javascript:document.location='http://www.google.com/'">XSS</a> becomes <a>XSS</a>

...as well as occasionally being stripped down to the protocol, e.g.

<a href="http://1113982867/">XSS</a> becomes <a href="http:/">XSS</a>

While that's unproblematic, per se, it's a bit ugly. Instead of trying to strip these out with regular expressions, I was hoping to use HTML Purifier's own library capabilities / injectors / plug-ins / whathaveyou.

Point of reference: Handling attributes

Conditionally removing an attribute in HTMLPurifier is easy. Here the library offers the class HTMLPurifier_AttrTransform with the method confiscateAttr().

While I don't personally use the functionality of confiscateAttr(), I do use an HTMLPurifier_AttrTransform as per this thread to add target="_blank" to all anchors.

// more configuration stuff up here
$htmlDef = $htmlPurifierConfiguration->getHTMLDefinition(true);
$anchor  = $htmlDef->addBlankElement('a');
$anchor->attr_transform_post[] = new HTMLPurifier_AttrTransform_Target();
// purify down here

HTMLPurifier_AttrTransform_Target is a very simple class, of course.

class HTMLPurifier_AttrTransform_Target extends HTMLPurifier_AttrTransform
{
    public function transform($attr, $config, $context) {
        // I could call $this->confiscateAttr() here to throw away an
        // undesired attribute
        $attr['target'] = '_blank';
        return $attr;
    }
}

That part works like a charm, naturally.

Handling elements

Perhaps I'm not squinting hard enough at HTMLPurifier_TagTransform, or am looking in the wrong place(s), or generally amn't understanding it, but I can't seem to figure out a way to conditionally remove elements.

Say, something to the effect of:

// more configuration stuff up here
$htmlDef = $htmlPurifierConfiguration->getHTMLDefinition(true);
$anchor  = $htmlDef->addElementHandler('a');
$anchor->elem_transform_post[] = new HTMLPurifier_ElementTransform_Cull();
// add target as per 'point of reference' here
// purify down here

With the Cull class extending something that has a confiscateElement() ability, or comparable, wherein I could check for a missing href attribute or a href attribute with the content http:/.

HTMLPurifier_Filter

I understand I could create a filter, but the examples (Youtube.php and ExtractStyleBlocks.php) suggest I'd be using regular expressions in that, which I'd really rather avoid, if it is at all possible. I'm hoping for an onboard or quasi-onboard solution that makes use of HTML Purifier's excellent parsing capabilities.

Returning null in a child-class of HTMLPurifier_AttrTransform unfortunately doesn't cut it.

Anyone have any smart ideas, or am I stuck with regexes? :)

解决方案

Success! Thanks to Ambush Commander and mcgrailm in another question, I am now using a hilariously simple solution:

// a bit of context
$htmlDef = $this->configuration->getHTMLDefinition(true);
$anchor  = $htmlDef->addBlankElement('a');

// HTMLPurifier_AttrTransform_RemoveLoneHttp strips 'href="http:/"' from
// all anchor tags (see first post for class detail)
$anchor->attr_transform_post[] = new HTMLPurifier_AttrTransform_RemoveLoneHttp();

// this is the magic! We're making 'href' a required attribute (note the
// asterisk) - now HTML Purifier removes <a></a>, as well as
// <a href="http:/"></a> after HTMLPurifier_AttrTransform_RemoveLoneHttp
// is through with it!
$htmlDef->addAttribute('a', 'href*', new HTMLPurifier_AttrDef_URI());

It works, it works, bahahahaHAHAHAHAnhͥͤͫ̀ğͮ͑̆ͦó̓̉ͬ͋h́ͧ̆̈́̉ğ̈́͐̈a̾̈́̑ͨô̔̄̑̇g̀̄h̘̝͊̐ͩͥ̋ͤ͛g̦̣̙̙̒̀ͥ̐̔ͅo̤̣hg͓̈́͋̇̓́̆a͖̩̯̥͕͂̈̐ͮ̒o̶ͬ̽̀̍ͮ̾ͮ͢҉̩͉̘͓̙̦̩̹͍̹̠̕g̵̡͔̙͉̱̠̙̩͚͑ͥ̎̓͛̋͗̍̽͋͑̈́̚...! * manic laughter, gurgling noises, keels over with a smile on her face *

这篇关于HTML净化器:根据条件属性有条件地去除元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-18 09:14