本文介绍了Java应用程序:无法正确读取iso-8859-1编码文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文件,编码为iso-8859-1,并包含字符,如ô。

I have a file which is encoded as iso-8859-1, and contains characters such as ô .

我正在读这个文件与java代码, :

I am reading this file with java code, something like:

File in = new File("myfile.csv");
InputStream fr = new FileInputStream(in);
byte[] buffer = new byte[4096];
while (true) {
    int byteCount = fr.read(buffer, 0, buffer.length);
    if (byteCount <= 0) {
        break;
    }

    String s = new String(buffer, 0, byteCount,"ISO-8859-1");
    System.out.println(s);
}

但是ô字符总是乱码,通常打印为? 。

However the ô character is always garbled, usually printing as a ? .

我已经阅读过这个主题了。

I have read around the subject (and learnt a little on the way) e.g.





  • http://www.joelonsoftware.com/articles/Unicode.html
  • http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058
  • http://www.ingrid.org/java/i18n/utf-16/

但仍然无法正常工作

我已经检查我的jdk支持所需的字符集(他们是标准的,所以这是没有惊喜)使用: / p>

I have checked that my jdk supports the required charsets (they are standard, so this is no suprise) using :

System.out.println(java.nio.charset.Charset.availableCharsets());


推荐答案

我怀疑您的档案不是 编码为ISO-8859-1,或System.out不知道如何打印字符。

I suspect that either your file isn't actually encoded as ISO-8859-1, or System.out doesn't know how to print the character.

我建议您检查首先,检查文件中的相关字节。要检查第二个,检查字符串中的相关字符,使用

I recommend that to check for the first, you examine the relevant byte in the file. To check for the second, examine the relevant character in the string, printing it out with

 System.out.println((int) s.getCharAt(index));

在这两种情况下,结果 0xf4 hex。

In both cases the result should be 244 decimal; 0xf4 hex.

请参阅一般建议(代码提出在C#,但很容易转换为Java,原则是一样的)。

See my article on Unicode debugging for general advice (the code presented is in C#, but it's easy to convert to Java, and the principles are the same).

一般,顺便说一句,我会用一个 InputStreamReader 用合适的编码包装流 - 它比手动创建新字符串更容易。我知道这可能只是演示代码虽然。

In general, by the way, I'd wrap the stream with an InputStreamReader with the right encoding - it's easier than creating new strings "by hand". I realise this may just be demo code though.

编辑:这是一个很简单的方法来证明控制台是否工作:

Here's a really easy way to prove whether or not the console will work:

 System.out.println("Here's the character: \u00f4");

这篇关于Java应用程序:无法正确读取iso-8859-1编码文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-22 17:01