

我一直在试验各种Java代码试图想出一些东西来编码一个包含引号,空格和异国情调Unicode字符的字符串,并产生与JavaScript的相同的输出 function。

I've been experimenting with various bits of Java code trying to come up with something that will encode a string containing quotes, spaces and "exotic" Unicode characters and produce output that's identical to JavaScript's encodeURIComponent function.


My torture test string is: "A" B ± "


If I enter the following JavaScript statement in Firebug:

encodeURIComponent('"A" B ± "');


—Then I get:



Here's my little test Java program:

import java.io.UnsupportedEncodingException;
import java.net.URLEncoder;

public class EncodingTest
  public static void main(String[] args) throws UnsupportedEncodingException
    String s = "\"A\" B ± \"";
    System.out.println("URLEncoder.encode returns "
      + URLEncoder.encode(s, "UTF-8"));

    System.out.println("getBytes returns "
      + new String(s.getBytes("UTF-8"), "ISO-8859-1"));


—This program outputs:

URLEncoder.encode returns %22A%22+B+%C2%B1+%22
getBytes returns "A" B ± "

关闭,但没有雪茄!使用Java编码UTF-8字符串的最佳方法是什么,以便它产生与JavaScript的 encodeURIComponent 相同的输出?

Close, but no cigar! What is the best way of encoding a UTF-8 string using Java so that it produces the same output as JavaScript's encodeURIComponent?

编辑:我很快就会使用Java 1.4迁移到Java 5.

I'm using Java 1.4 moving to Java 5 shortly.



Looking at the implementation differences, I see that:

  • 文字字符(正则表达式): [ - a-zA-Z0-9 ._ *〜'()!]

  • literal characters (regex representation): [-a-zA-Z0-9._*~'()!]

上的Java 1.5.0文档:

Java 1.5.0 documentation on URLEncoder:

  • 文字字符(正则表达式): [ - a-zA-Z0-9 ._ *]

  • 空格字符 转换为加号 +

  • literal characters (regex representation): [-a-zA-Z0-9._*]
  • the space character " " is converted into a plus sign "+".

所以基本上,要获得所需的结果,请使用 URLEncoder.encode(s,UTF-8 )然后进行一些后期处理:

So basically, to get the desired result, use URLEncoder.encode(s, "UTF-8") and then do some post-processing:

  • 替换所有出现的+ %20

  • 替换所有出现的%xx 代表任何 [〜'()!] 返回其文字对应部分

  • replace all occurrences of "+" with "%20"
  • replace all occurrences of "%xx" representing any of [~'()!] back to their literal counter-parts


11-03 11:16