本文介绍了SparkR展示中国的性格错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是新来SparkR,这几天我遇到了转换后的文件包含中国字符到SparkR一个问题,它不会正常显示了。像这样的:

 城市= C(北京,上海,杭州)
A< - as.data.frame(市)
一个
  市
1北京
2上海
3杭州

然后,我创建了一个基于在SparkR一个DataFram,并收集出来,eveything改变。

 收集(createDataFrame(sqlContext,A))
      市
1 \\ 027 \\西飞
2 \\ NW
3米\\ XDE

我不知道如何将它们转移回可读的中国人物,甚至我希望我能得到SparkR可读的字符,它应该对我来说是方便的调试。

我用Linux服务器上,不知道它是与该。有谁知道这事?

下面是sessionInfo()

 > sessionInfo()
- [R版本3.2.2(2015年8月14日)
平台:x86_64的-红帽Linux的GNU的(64位)
下运行:红帽企业Linux服务器7.2(米埔)区域:
 [1] = LC_CTYPE的en_US.UTF-8 LC_NUMERIC = C = LC_TIME的en_US.UTF-8 LC_COLLATE =的en_US.UTF-8
 [5] LC_MONETARY =的en_US.UTF-8 LC_MESSAGES =的en_US.UTF-8 LC_PAPER =的en_US.UTF-8 LC_NAME = C
 [9] LC_ADDRESS = C LC_TELEPHONE = C = LC_MEASUREMENT的en_US.UTF-8 LC_IDENTIFICATION = C附基本软件包:
[1]统计图形grDevices utils的数据集的方法基地其他附着物包:
[1] SparkR_1.5.2经由一个命名空间加载(和未附):
[1] tools_3.2.2


解决方案

这是一个已知的问题(影响一般的Uni code字符),并已解决了1.6。请参见。您可以修补,重建或1.5升级到1.6

I am new to SparkR, these days I encountered a problem that after convert a file contain Chinese character into SparkR, it would not shown properly anymore. Like this:

city=c("北京","上海","杭州")
A <- as.data.frame(city)
A
  city
1 北京
2 上海
3 杭州

Then, I created a DataFram in SparkR based on that, and collect it out, eveything changed.

collect(createDataFrame(sqlContext,A))
      city
1 \027\xac
2      \nw
3    m\xde

I don't know how to transfer them back to readable Chinese character, or even I hope I can get readable character in SparkR, which should be convenient for me to debug.

I use linux server, not sure if it's related to that. Does anybody know anything about it?

Below is the sessionInfo()

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.2 (Maipo)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] SparkR_1.5.2

loaded via a namespace (and not attached):
[1] tools_3.2.2
解决方案

It is a known issue (affects Unicode characters in general) and is already solved in 1.6. See SPARK-8951. You can either patch and rebuild 1.5 or upgrade to 1.6

这篇关于SparkR展示中国的性格错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-26 19:58