本文介绍了什么是 nokogiri % 编码 $ 字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么我得到:

Nokogiri::HTML('<a href="/test_$4b.html">test</a>').to_html=>"<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body><a href=\"/test_%244b.html\">test</a></body></html>\n"

我认为 $ 符号在网址中有效?

跟进:

为什么浏览器的处理方式不同.例如.在页面中:http://www.pmlive.com/pharma_news/its_on_shire_and_abbvie_agree_32bn_takeover9_586>9

链接:http://www.pmlive.com/pharma_news/mylan_buys_abbotts_non-us_generics_in_53 亿美元_deal_585883 有效.

但是 nokogiri 会将这个链接解析为:http://www.pmlive.com/pharma_news/mylan_buys_abbotts_non-us_35bnerics_in_%245.3bn_deal_5858883a> 不起作用(返回 404).

他们是否认为 $ 实际上是安全且更好的选择?

解决方案

这里有 这里 RFC3986 将美元符号列为保留的子分隔符(第 12 页).

保留 = gen-delims/sub-delims

gen-delims = ":";//"/?"/#"/[";/]"/@"

sub-delims = "!";/$"/&"/'";/"("/")";/*"/+"/,"/;"/="

它还建议如何处理保留字符:

2.2.保留字符

URI 包括组件和子组件,这些组件和子组件由保留"中的字符放.这些字符被称为保留"因为它们可能(也可能不会)被定义为分隔符通用语法,通过每个方案特定的语法,或通过URI 解引用算法的特定于实现的语法.如果 URI 组件的数据与保留的字符的用途作为分隔符,那么冲突的数据必须是在 URI 形成之前进行百分比编码.

Nokogiri 的作者喜欢决定,由于他们的库可以被任何人用于任何目的,因此无法自动确定保留字符是否会发生冲突,因此是最安全"的.处理它的方法(没有直接测试 URI)是根据建议对其进行转义.

Why do I get:

Nokogiri::HTML('<a href="/test_$4b.html">test</a>').to_html

=> "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body><a href=\"/test_%244b.html\">test</a></body></html>\n"

I thought $ symbol was valid in the url?

Followup:

Why do browsers handle this differently. E.g. In the page: http://www.pmlive.com/pharma_news/its_on_shire_and_abbvie_agree_32bn_takeover_586969

The link: http://www.pmlive.com/pharma_news/mylan_buys_abbotts_non-us_generics_in_$5.3bn_deal_585883 works.

But nokogiri would parse this link as:http://www.pmlive.com/pharma_news/mylan_buys_abbotts_non-us_generics_in_%245.3bn_deal_585883 which does not work (returns 404).

Are they making the decision that $ is actually safe and a better choice?

解决方案

There's this RFC3986 here which lists the dollar sign as a reserved sub-delimiter (page 12).

It also recommends how reserved characters should be handle:

The authors of Nokogiri liked decided that since their library may be used by anyone for any purpose, there is no way to automatically determine whether a reserved character would conflict or not, and therefore the "safest" way to handle it (short of testing a URI directly) would be to escape it as per the recommendation.

这篇关于什么是 nokogiri % 编码 $ 字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-23 01:56