问题描述
我是新来SparkR,这几天我遇到了转换后的文件包含中国字符到SparkR一个问题,它不会正常显示了。像这样的:
城市= C(北京,上海,杭州)
A< - as.data.frame(市)
一个
市
1北京
2上海
3杭州
然后,我创建了一个基于在SparkR一个DataFram,并收集出来,eveything改变。
收集(createDataFrame(sqlContext,A))
市
1 \\ 027 \\西飞
2 \\ NW
3米\\ XDE
我不知道如何将它们转移回可读的中国人物,甚至我希望我能得到SparkR可读的字符,它应该对我来说是方便的调试。
我用Linux服务器上,不知道它是与该。有谁知道这事?
下面是sessionInfo()
> sessionInfo()
- [R版本3.2.2(2015年8月14日)
平台:x86_64的-红帽Linux的GNU的(64位)
下运行:红帽企业Linux服务器7.2(米埔)区域:
[1] = LC_CTYPE的en_US.UTF-8 LC_NUMERIC = C = LC_TIME的en_US.UTF-8 LC_COLLATE =的en_US.UTF-8
[5] LC_MONETARY =的en_US.UTF-8 LC_MESSAGES =的en_US.UTF-8 LC_PAPER =的en_US.UTF-8 LC_NAME = C
[9] LC_ADDRESS = C LC_TELEPHONE = C = LC_MEASUREMENT的en_US.UTF-8 LC_IDENTIFICATION = C附基本软件包:
[1]统计图形grDevices utils的数据集的方法基地其他附着物包:
[1] SparkR_1.5.2经由一个命名空间加载(和未附):
[1] tools_3.2.2
这是一个已知的问题(影响一般的Uni code字符),并已解决了1.6。请参见。您可以修补,重建或1.5升级到1.6
I am new to SparkR, these days I encountered a problem that after convert a file contain Chinese character into SparkR, it would not shown properly anymore. Like this:
city=c("北京","上海","杭州")
A <- as.data.frame(city)
A
city
1 北京
2 上海
3 杭州
Then, I created a DataFram in SparkR based on that, and collect it out, eveything changed.
collect(createDataFrame(sqlContext,A))
city
1 \027\xac
2 \nw
3 m\xde
I don't know how to transfer them back to readable Chinese character, or even I hope I can get readable character in SparkR, which should be convenient for me to debug.
I use linux server, not sure if it's related to that. Does anybody know anything about it?
Below is the sessionInfo()
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.2 (Maipo)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] SparkR_1.5.2
loaded via a namespace (and not attached):
[1] tools_3.2.2
It is a known issue (affects Unicode characters in general) and is already solved in 1.6. See SPARK-8951. You can either patch and rebuild 1.5 or upgrade to 1.6
这篇关于SparkR展示中国的性格错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!