本文介绍了SparkSQL - 直接读取实验文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从Impala迁移到SparkSQL,使用以下代码读取表:

  my_data = sqlContext.read。 parquet('hdfs://my_hdfs_path/my_db.db/my_table')

如何调用SparkSQL以上,所以它可以返回类似于:

 'select col_A,col_B from my_table'

code> sql queries 就可以了。

  val sqlContext = new org.apache.spark .sql.SQLContext(sc)

val df = sqlContext.read.parquet(src / main / resources / peopleTwo.parquet)

df.printSchema

//注册为表后,您将能够运行sql查询
df.registerTempTable(people)

sqlContext.sql(select * from people ).collect.foreach(println)


I am migrating from Impala to SparkSQL, using the following code to read a table:

my_data = sqlContext.read.parquet('hdfs://my_hdfs_path/my_db.db/my_table')

How do I invoke SparkSQL above, so it can return something like:

'select col_A, col_B from my_table'
解决方案

After creating a Dataframe from parquet file, you have to register it as a temp table to run sql queries on it.

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

val df = sqlContext.read.parquet("src/main/resources/peopleTwo.parquet")

df.printSchema

// after registering as a table you will be able to run sql queries
df.registerTempTable("people")

sqlContext.sql("select * from people").collect.foreach(println)

这篇关于SparkSQL - 直接读取实验文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-26 13:38