问题描述
我在代码中有以下内容在jar文件中从UTF-8转换为ISO-8859-1,当我在Windows中执行此jar时,我得到一个结果,在CentOS中,我得到另一个结果。有谁知道为什么?
I have the following in code to convert from UTF-8 to ISO-8859-1 in a jar file and when I execute this jar in Windows I get one result and in CentOS I get another. Might anyone know why?
public static void main(String[] args) {
try {
String x = "Ä, ä, É, é, Ö, ö, Ü, ü, ß, «, »";
Charset utf8charset = Charset.forName("UTF-8");
Charset iso88591charset = Charset.forName("ISO-8859-1");
ByteBuffer inputBuffer = ByteBuffer.wrap(x.getBytes());
CharBuffer data = utf8charset.decode(inputBuffer);
ByteBuffer outputBuffer = iso88591charset.encode(data);
byte[] outputData = outputBuffer.array();
String z = new String(outputData);
System.out.println(z);
}
catch(Exception e) {
System.out.println(e.getMessage());
}
}
在Windows中,java -jar test.jar> test .txt创建一个文件,其中包含:$ b $bÄ,ä,É,é,Ö,ö,Ü,ü,ß,«,»
In Windows, java -jar test.jar > test.txt creates a file containing:Ä, ä, É, é, Ö, ö, Ü, ü, ß, «, »
但在CentOS中我得到:
?,ä, ?,é, ?,ö, ?,ü, ?,«,»
but in CentOS I get:�?, ä, �?, é, �?, ö, �?, ü, �?, «, »
推荐答案
在考虑输出之前,首先应该在java中以正确的内部表示形式获取字符串。 I.E.它应该是:
You should first and foremost get the string in correct internal representation in java before even thinking about output. I.E. it should be that:
z.equals("Ä, ä, É, é, Ö, ö, Ü, ü, ß, «, »") == true
以上内容可以在没有任何输出编码问题的情况下进行验证,因为它只是打印 true
或 false
。
The above can be verified without any output encoding issues, because it simply prints true
or false
.
In Windows已经实现了这个目标
In Windows you already achieved this with
ByteBuffer inputBuffer = ByteBuffer.wrap(x.getBytes());
CharBuffer data = utf8charset.decode(inputBuffer);
因为所有你需要从Ã,Ã,,à ‰,Ã,,Ã,Ã,,Ã,,Ã,,â€,
到Ä,ä,É,é,Ö,ö, Ü,ü,ß,«,»
是:
Because all you need to go from "Ä, ä, É, é, Ö, ö, Ü, ü, ß, «, »"
to "Ä, ä, É, é, Ö, ö, Ü, ü, ß, «, »"
is:
ByteBuffer inputBuffer = ByteBuffer.wrap(x.getBytes( windows1252/*explicit windows1252 works on CentOS too*/));
CharBuffer data = utf8charset.decode(inputBuffer);
此后你用ISO-8859-1做了一些事情,这是徒劳的,因为只有一半的字符在您的原始字符串
可以用ISO-8859-1表示,更不用说您已按上述方式完成了。您可以在 utf8charset.decode(inputBuffer)之后删除代码
After this you do something with ISO-8859-1, which is futile because barely half the characters in your original stringcan be represented in ISO-8859-1 not to mention you are already done as per above. You can delete the code after utf8charset.decode(inputBuffer)
所以现在您的代码看起来像:
So now your code could look like:
String x = "Ä, ä, É, é, Ö, ö, Ü, ü, ß, «, »";
Charset windows1252 = Charset.forName("Windows-1252");
Charset utf8charset = Charset.forName("UTF-8");
byte[] bytes = x.getBytes(windows1252);
String z = new String(bytes, utf8charset);
//Still wondering why you didn't just have this literal to begin with
//Check that the strings are internally equal so you know at least that
//the code is working
System.out.println(z.equals( "Ä, ä, É, é, Ö, ö, Ü, ü, ß, «, »"));
System.out.println(z);
这篇关于为什么在Windows和Linux中从UTF-8到ISO-8859-1的转换不一样?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!