为什么在Windows和Linux中从UTF-8到ISO-8859-1的转换不一样？

本文介绍了为什么在Windows和Linux中从UTF-8到ISO-8859-1的转换不一样？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在代码中有以下内容在jar文件中从UTF-8转换为ISO-8859-1，当我在Windows中执行此jar时，我得到一个结果，在CentOS中，我得到另一个结果。有谁知道为什么？

I have the following in code to convert from UTF-8 to ISO-8859-1 in a jar file and when I execute this jar in Windows I get one result and in CentOS I get another. Might anyone know why?

public static void main(String[] args) {

  try {

    String x = "Ã„, Ã¤, Ã‰, Ã©, Ã–, Ã¶, Ãœ, Ã¼, ÃŸ, Â«, Â»";

    Charset utf8charset = Charset.forName("UTF-8");
    Charset iso88591charset = Charset.forName("ISO-8859-1");

    ByteBuffer inputBuffer = ByteBuffer.wrap(x.getBytes());
    CharBuffer data = utf8charset.decode(inputBuffer);

    ByteBuffer outputBuffer = iso88591charset.encode(data);
    byte[] outputData = outputBuffer.array();

    String z = new String(outputData);

    System.out.println(z);
  }
  catch(Exception e) {
    System.out.println(e.getMessage());
  }
}

在Windows中，java -jar test.jar> test .txt创建一个文件，其中包含：$ b $bÄ，ä，É，é，Ö，ö，Ü，ü，ß，«，»

In Windows, java -jar test.jar > test.txt creates a file containing:Ä, ä, É, é, Ö, ö, Ü, ü, ß, «, »

但在CentOS中我得到：
？，ä，？，é，？，ö，？，ü，？，«，»

but in CentOS I get:�?, ä, �?, é, �?, ö, �?, ü, �?, «, »

推荐答案

在考虑输出之前，首先应该在java中以正确的内部表示形式获取字符串。 I.E.它应该是：

You should first and foremost get the string in correct internal representation in java before even thinking about output. I.E. it should be that:

z.equals("Ä, ä, É, é, Ö, ö, Ü, ü, ß, «, »") == true

以上内容可以在没有任何输出编码问题的情况下进行验证，因为它只是打印 true 或 false 。

The above can be verified without any output encoding issues, because it simply prints true or false.

In Windows已经实现了这个目标

In Windows you already achieved this with

ByteBuffer inputBuffer = ByteBuffer.wrap(x.getBytes());
CharBuffer data = utf8charset.decode(inputBuffer);

因为所有你需要从Ã，Ã，，Ã ‰，Ã，，Ã，Ã，，Ã，，Ã，，â€，到Ä，ä，É，é，Ö，ö， Ü，ü，ß，«，»是：

Because all you need to go from "Ã„, Ã¤, Ã‰, Ã©, Ã–, Ã¶, Ãœ, Ã¼, ÃŸ, Â«, Â»" to "Ä, ä, É, é, Ö, ö, Ü, ü, ß, «, »" is:

ByteBuffer inputBuffer = ByteBuffer.wrap(x.getBytes( windows1252/*explicit windows1252 works on CentOS too*/));
CharBuffer data = utf8charset.decode(inputBuffer);

此后你用ISO-8859-1做了一些事情，这是徒劳的，因为只有一半的字符在您的原始字符串
可以用ISO-8859-1表示，更不用说您已按上述方式完成了。您可以在 utf8charset.decode（inputBuffer）之后删除代码

After this you do something with ISO-8859-1, which is futile because barely half the characters in your original stringcan be represented in ISO-8859-1 not to mention you are already done as per above. You can delete the code after utf8charset.decode(inputBuffer)

所以现在您的代码看起来像：

So now your code could look like:

String x = "Ã„, Ã¤, Ã‰, Ã©, Ã–, Ã¶, Ãœ, Ã¼, ÃŸ, Â«, Â»";

Charset windows1252 = Charset.forName("Windows-1252");
Charset utf8charset = Charset.forName("UTF-8");

byte[] bytes = x.getBytes(windows1252);
String z = new String(bytes, utf8charset);

                                //Still wondering why you didn't just have this literal to begin with
                                //Check that the strings are internally equal so you know at least that
                                //the code is working

System.out.println(z.equals( "Ä, ä, É, é, Ö, ö, Ü, ü, ß, «, »")); 
System.out.println(z);

这篇关于为什么在Windows和Linux中从UTF-8到ISO-8859-1的转换不一样？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！