问题描述
我是 Spark 的新手.我想将数据帧数据写入 hive 表.Hive 表在多个列上进行分区.通过 Hivemetastore 客户端,我正在获取分区列,并将其作为一个变量在数据帧的 write 方法中的 partitionby 子句中传递.
I am a newbie in Spark.I want to write the dataframe data into hive table. Hive table is partitioned on mutliple column. Through, Hivemetastore client I am getting the partition column and passing that as a variable in partitionby clause in write method of dataframe.
var1="country","state" (Getting the partiton column names of hive table)
dataframe1.write.partitionBy(s"$var1").mode("overwrite").save(s"$hive_warehouse/$dbname.db/$temp_table/")
当我执行上面的代码时,它给了我错误分区国家",州"不存在.我认为它将国家",州"作为字符串.
When I am executing the above code,it is giving me error partiton "country","state" does not exists.I think it is taking "country","state" as a string.
你能帮我吗.
推荐答案
partitionBy 函数采用 varargs
而不是列表.您可以将其用作
The partitionBy function takes a varargs
not a list. You can use this as
dataframe1.write.partitionBy("country","state").mode("overwrite").save(s"$hive_warehouse/$dbname.db/$temp_table/")
或者在 Scala 中,您可以将列表转换为可变参数,例如
Or in scala you can convert a list into a varargs like
val columns = Seq("country","state")
dataframe1.write.partitionBy(columns:_*).mode("overwrite").save(s"$hive_warehouse/$dbname.db/$temp_table/")
这篇关于如何在Spark中的partitionby方法中传递多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!