本文介绍了如何使用RubyGem Sanitize变换器将无序列表清理为逗号分隔列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!


任何熟悉RubyGem Sanitize的人,都会提供一个构建Transformer来转换的例子

 < UL><李>一种与LT; /立GT;<李> b将/立GT;<李>℃下/立GT;< / UL>中



$ b



项目= []
s =< ul>< li>一些空间< / li>< li>更多东西&空间< / li>< li>最后一个< ; /立GT;< / UL>中
save_li = lambda do | env |
node = env [:node]
items<< node.text.strip if node.text?
Sanitize.clean(s,:transformers => save_li)
output =#{items [0 ..- 2] .join(,)}和#{items [-1]}
#=> 一些空间,更多的东西与空间,最后一个



  items = [] $ b $使用< b /> html< / li>< li>< / li>< / ul>< b>< ; 
save_li = lambda do | env |
node = env [:node]
items<< node.content if node.name ==li
Sanitize.clean(s,:transformers => save_li)
#=> 一些带有html c的空格项
output =#{items [0 ..- 2] .join(,)}和#{items [-1]}
#= > 一些空格,带有html的项目和c

这种方法依赖于默认的无被列入白名单。 < b> 标签仍然由 save_li lambda访问,但它们被剥离。这可能会在各种情况下导致问题。

Any one familiar with the RubyGem Sanitize, that provide an example of building a "Transformer" to convert



"a,b, and c"



IMO transformers are not for pulling out data like this:

This is not what you're trying to do; you're trying to pull data out of nodes, and transform it. In your example, you're not doing the same thing to each element: you're sometimes appending a comma, sometimes appending a comma and the word "and".

In order to do that, you either need to save state and post-process, or look ahead in the node stream to see if you're visiting the last node. I don't know of a trivial way to do that with Sanitize's transformers, so this example saves state and post-processes.

require 'sanitize'
items = []
s = "<ul><li>some space</li><li>more stuff with spaces</li><li>last one</li></ul>"
save_li = lambda do |env|
  node = env[:node]
  items << node.text.strip if node.text?
Sanitize.clean(s, :transformers => save_li)
# => "  some space  more stuff with spaces  last one  "
output = "#{items[0..-2].join(", ")}, and #{items[-1]}"
# => "some space, more stuff with spaces, and last one"

IMO this example is an abuse of transformers because it's being run only for its side effect, it does nothing other than look for text nodes.

If one of the list items has embedded HTML, the naive approach no longer works, and you need to start knowing more Nokogiri anyway:

items = []
s = "<ul><li>some space</li><li>item <b>with<b/> html</li><li>c</li></ul>"
save_li = lambda do |env|
  node = env[:node]
  items << node.content if node.name == "li"
Sanitize.clean(s, :transformers => save_li)
# => "  some space  item with html  c  "
output = "#{items[0..-2].join(", ")}, and #{items[-1]}"
# => "some space, item with html, and c"

This approach relies on the default Sanitize behavior of nothing being whitelisted. The <b> tags are still visited by the save_li lambda, but they're stripped. This has a potential to cause issues under a variety of circumstances.

这篇关于如何使用RubyGem Sanitize变换器将无序列表清理为逗号分隔列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-23 06:13