问题描述
我尝试使用 java.io.FileReader 读取一些文本文件并将它们转换为字符串,但我发现结果编码错误并且根本不可读.
I tried to use java.io.FileReader to read some text files and convert them into a string, but I found the result is wrongly encoded and not readable at all.
这是我的环境:
Windows 2003,操作系统编码:CP1252
Windows 2003, OS encoding: CP1252
Java 5.0
我的文件是 UTF-8 编码或 CP1252 编码的,其中一些(UTF-8 编码的文件)可能包含中文(非拉丁)字符.
My files are UTF-8 encoded or CP1252 encoded, and some of them (UTF-8 encoded files) may contain Chinese (non-Latin) characters.
我使用以下代码来完成我的工作:
I use the following code to do my work:
private static String readFileAsString(String filePath)
throws java.io.IOException{
StringBuffer fileData = new StringBuffer(1000);
FileReader reader = new FileReader(filePath);
//System.out.println(reader.getEncoding());
BufferedReader reader = new BufferedReader(reader);
char[] buf = new char[1024];
int numRead=0;
while((numRead=reader.read(buf)) != -1){
String readData = String.valueOf(buf, 0, numRead);
fileData.append(readData);
buf = new char[1024];
}
reader.close();
return fileData.toString();
}
上面的代码不起作用.我发现 FileReader 的编码是 CP1252,即使文本是 UTF-8 编码的.但是 java.io.FileReader 的 JavaDoc 说:
The above code doesn't work. I found the FileReader's encoding is CP1252 even if the text is UTF-8 encoded. But the JavaDoc of java.io.FileReader says that:
这个类的构造函数假设默认字符编码和默认的字节缓冲区大小是合适.
这是否意味着如果我使用 FileReader,我不需要自己设置字符编码?但是我目前确实得到了错误编码的数据,处理我的情况的正确方法是什么?谢谢.
Does this mean that I am not required to set character encoding by myself if I am using FileReader? But I did get wrongly encoded data currently, what's the correct way to deal with my situtaion? Thanks.
推荐答案
是的,您需要指定要读取的文件的编码.
是的,这意味着您必须知道要读取的文件的编码.
Yes, this means that you have to know the encoding of the file you want to read.
不,没有通用的方法来猜测任何给定纯文本"文件的编码.
No, there is no general way to guess the encoding of any given "plain text" file.
FileReader
的 -arguments 构造函数总是使用平台默认编码,这通常是一个坏主意.
自 Java 11 FileReader
还获得了接受编码的构造函数:new FileReader(file, charset)代码>
和 new FileReader(fileName, charset)
.
Since Java 11 FileReader
has also gained constructors that accept an encoding: new FileReader(file, charset)
and new FileReader(fileName, charset)
.
在较早版本的java中,您需要使用new InputStreamReader(
new FileInputStream(pathToFile)
, )
.
In earlier versions of java, you need to use new InputStreamReader(
new FileInputStream(pathToFile)
, <encoding>)
.
这篇关于Java FileReader 编码问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!