本文介绍了SparkSQL - 直接读取实验文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我从Impala迁移到SparkSQL,使用以下代码读取表:
my_data = sqlContext.read。 parquet('hdfs://my_hdfs_path/my_db.db/my_table')
如何调用SparkSQL以上,所以它可以返回类似于:
'select col_A,col_B from my_table'
就可以了。
$ c $在创建一个来自parquet文件的数据框后,你必须将它注册为一个临时表来运行code> sql queries
val sqlContext = new org.apache.spark .sql.SQLContext(sc)
val df = sqlContext.read.parquet(src / main / resources / peopleTwo.parquet)
df.printSchema
//注册为表后,您将能够运行sql查询
df.registerTempTable(people)
sqlContext.sql(select * from people ).collect.foreach(println)
I am migrating from Impala to SparkSQL, using the following code to read a table:
my_data = sqlContext.read.parquet('hdfs://my_hdfs_path/my_db.db/my_table')
How do I invoke SparkSQL above, so it can return something like:
'select col_A, col_B from my_table'
解决方案After creating a Dataframe from parquet file, you have to register it as a temp table to run
sql queries
on it.val sqlContext = new org.apache.spark.sql.SQLContext(sc) val df = sqlContext.read.parquet("src/main/resources/peopleTwo.parquet") df.printSchema // after registering as a table you will be able to run sql queries df.registerTempTable("people") sqlContext.sql("select * from people").collect.foreach(println)
这篇关于SparkSQL - 直接读取实验文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!