本文介绍了如何将路径列表传递给 spark.read.load?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以通过将多个路径传递给 load 方法来一次加载多个文件,例如

I can load multiple files at once by passing multiple paths to the load method, e.g.

spark.read
  .format("com.databricks.spark.avro")
  .load(
    "/data/src/entity1/2018-01-01",
    "/data/src/entity1/2018-01-12",
    "/data/src/entity1/2018-01-14")

我想先准备一个路径列表并将它们传递给 load 方法,但我收到以下编译错误:

I'd like to prepare a list of paths first and pass them to the load method, but I get the following compilation error:

val paths = Seq(
  "/data/src/entity1/2018-01-01",
  "/data/src/entity1/2018-01-12",
  "/data/src/entity1/2018-01-14")
spark.read.format("com.databricks.spark.avro").load(paths)

<console>:29: error: overloaded method value load with alternatives:
  (paths: String*)org.apache.spark.sql.DataFrame <and>
  (path: String)org.apache.spark.sql.DataFrame
 cannot be applied to (List[String])spark.read.format("com.databricks.spark.avro").load(paths)

为什么?如何将路径列表传递给 load 方法?

Why? How to pass a list of paths to the load method?

推荐答案

你只需要一个 splat 运算符 (_*) paths> 列为

You just need is a splat operator (_*) the paths list as

spark.read.format("com.databricks.spark.avro").load(paths: _*)

这篇关于如何将路径列表传递给 spark.read.load?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-28 17:58