本文介绍了如何使用UTF-8编码解析CSV文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我使用Spark 2.1.
I use Spark 2.1.
输入的csv文件包含如下所示的unicode字符
input csv file contains unicode characters like shown below
解析此csv文件时,输出如下所示
While parsing this csv file, the output is shown like below
我使用MS Excel 2010查看文件.
I use MS Excel 2010 to view files.
使用的Java代码是
@Test
public void TestCSV() throws IOException {
String inputPath = "/user/jpattnaik/1945/unicode.csv";
String outputPath = "file:\\C:\\Users\\jpattnaik\\ubuntu-bkp\\backup\\bug-fixing\\1945\\output-csv";
getSparkSession()
.read()
.option("inferSchema", "true")
.option("header", "true")
.option("encoding", "UTF-8")
.csv(inputPath)
.write()
.option("header", "true")
.option("encoding", "UTF-8")
.mode(SaveMode.Overwrite)
.csv(outputPath);
}
如何获得与输入相同的输出?
How can I get the output same as input?
推荐答案
我的猜测是输入文件不在UTF-8
中,因此输入的字符不正确.
My guess is that the input file is not in UTF-8
and hence you get the incorrect characters.
我的建议是编写一个纯Java应用程序(根本没有Spark),并查看使用UTF-8
编码是否读写效果相同.
My recommendation would be to write a pure Java application (with no Spark at all) and see if reading and writing gives the same results with UTF-8
encoding.
这篇关于如何使用UTF-8编码解析CSV文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!