问题描述
我一直在试验各种Java代码试图想出一些东西来编码一个包含引号,空格和异国情调Unicode字符的字符串,并产生与JavaScript的相同的输出 function。
I've been experimenting with various bits of Java code trying to come up with something that will encode a string containing quotes, spaces and "exotic" Unicode characters and produce output that's identical to JavaScript's encodeURIComponent function.
我的酷刑测试字符串是:AB±
My torture test string is: "A" B ± "
如果我在Firebug中输入以下JavaScript语句:
If I enter the following JavaScript statement in Firebug:
encodeURIComponent('"A" B ± "');
—然后我得到:
—Then I get:
"%22A%22%20B%20%C2%B1%20%22"
这是我的小测试Java程序:
Here's my little test Java program:
import java.io.UnsupportedEncodingException;
import java.net.URLEncoder;
public class EncodingTest
{
public static void main(String[] args) throws UnsupportedEncodingException
{
String s = "\"A\" B ± \"";
System.out.println("URLEncoder.encode returns "
+ URLEncoder.encode(s, "UTF-8"));
System.out.println("getBytes returns "
+ new String(s.getBytes("UTF-8"), "ISO-8859-1"));
}
}
—此程序输出:
—This program outputs:
URLEncoder.encode returns %22A%22+B+%C2%B1+%22
getBytes returns "A" B ± "
关闭,但没有雪茄!使用Java编码UTF-8字符串的最佳方法是什么,以便它产生与JavaScript的 encodeURIComponent
相同的输出?
Close, but no cigar! What is the best way of encoding a UTF-8 string using Java so that it produces the same output as JavaScript's encodeURIComponent
?
编辑:我很快就会使用Java 1.4迁移到Java 5.
I'm using Java 1.4 moving to Java 5 shortly.
推荐答案
寻找根据实施差异,我看到:
Looking at the implementation differences, I see that:
:
- 文字字符(正则表达式):
[ - a-zA-Z0-9 ._ *〜'()!]
- literal characters (regex representation):
[-a-zA-Z0-9._*~'()!]
上的Java 1.5.0文档:
Java 1.5.0 documentation on URLEncoder
:
- 文字字符(正则表达式):
[ - a-zA-Z0-9 ._ *]
- 空格字符
+
。
- literal characters (regex representation):
[-a-zA-Z0-9._*]
- the space character
" "
is converted into a plus sign"+"
.
所以基本上,要获得所需的结果,请使用 URLEncoder.encode(s,UTF-8 )
然后进行一些后期处理:
So basically, to get the desired result, use URLEncoder.encode(s, "UTF-8")
and then do some post-processing:
- 替换所有出现的
+
%20
- 替换所有出现的
%xx
代表任何[〜'()!]
返回其文字对应部分
- replace all occurrences of
"+"
with"%20"
- replace all occurrences of
"%xx"
representing any of[~'()!]
back to their literal counter-parts
这篇关于Java相当于生成相同输出的JavaScript的encodeURIComponent?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!