本文介绍了如果空替换0,在同一列,否则默认值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在SparkR外壳1.5.0,创建了一个简单的数据集:

  df_test<  -  createDataFrame(sqlContext,data.frame(星期一= C(1,2,3,4,5),年= C(2011,2012,2013,2014 ,2015年)))
df_test1< - createDataFrame(sqlContext,data.frame(MON1 = C(1,2,3,4,5,6,7,8)))
df_test2< - 加盟(df_test1,df_test,joinExpr = df_test1 $ MON1 == df_test $星期一,joinType =left_outer)

数据集:df_test2

  + ---- + ---- + ------ +
| MON1 |周一|一年|
+ ---- + ---- + ------ +
| 7.0 |空|空|
| 1.0 | 1.0 | 2011.0 |
| 6.0 |空|空|
| 3.0 | 3.0 | 2013.0 |
| 5.0 | 5.0 | 2015.0 |
| 8.0 |空|空|
| 4.0 | 4.0 | 2014.0 |
| 2.0 | 2.0 | 2012.0 |
+ ---- + ---- + ------ +

问:如果有我如何与 0 更换它列 df_test2 $年或者使用默认值?

输出应该是这样的,

  + ---- + ---- + ------ +
| MON1 |周一|一年|
+ ---- + ---- + ------ +
| 7.0 |空| 0 |
| 1.0 | 1.0 | 2011.0 |
| 6.0 |空| 0 |
| 3.0 | 3.0 | 2013.0 |
| 5.0 | 5.0 | 2015.0 |
| 8.0 |空| 0 |
| 4.0 | 4.0 | 2014.0 |
| 2.0 | 2.0 | 2012.0 |
+ ---- + ---- + ------ +

我用,否则/时,但不工作

  df_test2 $一年<  - 否则(如果(ISNULL(df_test2 $年),0),df_test2 $年)

它扔编错误,

 错误的代表(是的,length.out =长度(ANS)):
  试图复制型的环境的目的


解决方案

我用原始的SQL 情况下,当前pression得到答案,

  df_test3<  -  SQL(sqlContext,选择MON1,周一,情况下,当一年为null,则0,否则,年终一年TEMP)showDF(df_test3)
+ ---- + ---- + ------ +
| MON1 |周一|一年|
+ ---- + ---- + ------ +
| 7.0 |空| 0.0 |
| 1.0 | 1.0 | 2011.0 |
| 6.0 |空| 0.0 |
| 3.0 | 3.0 | 2013.0 |
| 5.0 | 5.0 | 2015.0 |
| 8.0 |空| 0.0 |
| 4.0 | 4.0 | 2014.0 |
| 2.0 | 2.0 | 2012.0 |
+ ---- + ---- + ------ +

虽然它给出了答案,我期待为纯sparkR code。

In SparkR shell 1.5.0, Created a sample data set:

df_test <- createDataFrame(sqlContext, data.frame(mon = c(1,2,3,4,5), year = c(2011,2012,2013,2014,2015)))
df_test1 <- createDataFrame(sqlContext, data.frame(mon1 = c(1,2,3,4,5,6,7,8)))
df_test2 <- join(df_test1, df_test, joinExpr = df_test1$mon1 == df_test$mon, joinType = "left_outer")

data set : df_test2

+----+----+------+
|mon1| mon|  year|
+----+----+------+
| 7.0|null|  null|
| 1.0| 1.0|2011.0|
| 6.0|null|  null|
| 3.0| 3.0|2013.0|
| 5.0| 5.0|2015.0|
| 8.0|null|  null|
| 4.0| 4.0|2014.0|
| 2.0| 2.0|2012.0|
+----+----+------+

Question: If there is null how can I replace it with 0 in column df_test2$year or else use a default value?

The output should look like this,

+----+----+------+
|mon1| mon|  year|
+----+----+------+
| 7.0|null|  0   |
| 1.0| 1.0|2011.0|
| 6.0|null|  0   |
| 3.0| 3.0|2013.0|
| 5.0| 5.0|2015.0|
| 8.0|null|  0   |
| 4.0| 4.0|2014.0|
| 2.0| 2.0|2012.0|
+----+----+------+

I have used otherwise/when, but doesn't work

df_test2$year <- otherwise(when(isNull(df_test2$year), 0 ), df_test2$year)

It throw ed error,

Error in rep(yes, length.out = length(ans)) :
  attempt to replicate an object of type 'environment'
解决方案

I have used raw SQL case when expression to get the answer,

df_test3 <- sql(sqlContext, "select mon1, mon, case when year is null then 0 else year end year FROM temp")

showDF(df_test3)
+----+----+------+
|mon1| mon|  year|
+----+----+------+
| 7.0|null|   0.0|
| 1.0| 1.0|2011.0|
| 6.0|null|   0.0|
| 3.0| 3.0|2013.0|
| 5.0| 5.0|2015.0|
| 8.0|null|   0.0|
| 4.0| 4.0|2014.0|
| 2.0| 2.0|2012.0|
+----+----+------+

Even though it gives the answer, i am looking for pure sparkR code.

这篇关于如果空替换0,在同一列,否则默认值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-24 14:49