本文介绍了如果空替换0,在同一列,否则默认值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
在SparkR外壳1.5.0,创建了一个简单的数据集:
df_test< - createDataFrame(sqlContext,data.frame(星期一= C(1,2,3,4,5),年= C(2011,2012,2013,2014 ,2015年)))
df_test1< - createDataFrame(sqlContext,data.frame(MON1 = C(1,2,3,4,5,6,7,8)))
df_test2< - 加盟(df_test1,df_test,joinExpr = df_test1 $ MON1 == df_test $星期一,joinType =left_outer)
数据集:df_test2
+ ---- + ---- + ------ +
| MON1 |周一|一年|
+ ---- + ---- + ------ +
| 7.0 |空|空|
| 1.0 | 1.0 | 2011.0 |
| 6.0 |空|空|
| 3.0 | 3.0 | 2013.0 |
| 5.0 | 5.0 | 2015.0 |
| 8.0 |空|空|
| 4.0 | 4.0 | 2014.0 |
| 2.0 | 2.0 | 2012.0 |
+ ---- + ---- + ------ +
问:如果有空
我如何与 0
更换它列 df_test2 $年
或者使用默认值?
输出应该是这样的,
+ ---- + ---- + ------ +
| MON1 |周一|一年|
+ ---- + ---- + ------ +
| 7.0 |空| 0 |
| 1.0 | 1.0 | 2011.0 |
| 6.0 |空| 0 |
| 3.0 | 3.0 | 2013.0 |
| 5.0 | 5.0 | 2015.0 |
| 8.0 |空| 0 |
| 4.0 | 4.0 | 2014.0 |
| 2.0 | 2.0 | 2012.0 |
+ ---- + ---- + ------ +
我用,否则/时
,但不工作
df_test2 $一年< - 否则(如果(ISNULL(df_test2 $年),0),df_test2 $年)
它扔编错误,
错误的代表(是的,length.out =长度(ANS)):
试图复制型的环境的目的
解决方案
我用原始的SQL 情况下,当
前pression得到答案,
df_test3< - SQL(sqlContext,选择MON1,周一,情况下,当一年为null,则0,否则,年终一年TEMP)showDF(df_test3)
+ ---- + ---- + ------ +
| MON1 |周一|一年|
+ ---- + ---- + ------ +
| 7.0 |空| 0.0 |
| 1.0 | 1.0 | 2011.0 |
| 6.0 |空| 0.0 |
| 3.0 | 3.0 | 2013.0 |
| 5.0 | 5.0 | 2015.0 |
| 8.0 |空| 0.0 |
| 4.0 | 4.0 | 2014.0 |
| 2.0 | 2.0 | 2012.0 |
+ ---- + ---- + ------ +
虽然它给出了答案,我期待为纯sparkR code。
In SparkR shell 1.5.0, Created a sample data set:
df_test <- createDataFrame(sqlContext, data.frame(mon = c(1,2,3,4,5), year = c(2011,2012,2013,2014,2015)))
df_test1 <- createDataFrame(sqlContext, data.frame(mon1 = c(1,2,3,4,5,6,7,8)))
df_test2 <- join(df_test1, df_test, joinExpr = df_test1$mon1 == df_test$mon, joinType = "left_outer")
data set : df_test2
+----+----+------+
|mon1| mon| year|
+----+----+------+
| 7.0|null| null|
| 1.0| 1.0|2011.0|
| 6.0|null| null|
| 3.0| 3.0|2013.0|
| 5.0| 5.0|2015.0|
| 8.0|null| null|
| 4.0| 4.0|2014.0|
| 2.0| 2.0|2012.0|
+----+----+------+
Question: If there is null
how can I replace it with 0
in column df_test2$year
or else use a default value?
The output should look like this,
+----+----+------+
|mon1| mon| year|
+----+----+------+
| 7.0|null| 0 |
| 1.0| 1.0|2011.0|
| 6.0|null| 0 |
| 3.0| 3.0|2013.0|
| 5.0| 5.0|2015.0|
| 8.0|null| 0 |
| 4.0| 4.0|2014.0|
| 2.0| 2.0|2012.0|
+----+----+------+
I have used otherwise/when
, but doesn't work
df_test2$year <- otherwise(when(isNull(df_test2$year), 0 ), df_test2$year)
It throw ed error,
Error in rep(yes, length.out = length(ans)) :
attempt to replicate an object of type 'environment'
解决方案
I have used raw SQL case when
expression to get the answer,
df_test3 <- sql(sqlContext, "select mon1, mon, case when year is null then 0 else year end year FROM temp")
showDF(df_test3)
+----+----+------+
|mon1| mon| year|
+----+----+------+
| 7.0|null| 0.0|
| 1.0| 1.0|2011.0|
| 6.0|null| 0.0|
| 3.0| 3.0|2013.0|
| 5.0| 5.0|2015.0|
| 8.0|null| 0.0|
| 4.0| 4.0|2014.0|
| 2.0| 2.0|2012.0|
+----+----+------+
Even though it gives the answer, i am looking for pure sparkR code.
这篇关于如果空替换0,在同一列,否则默认值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!