问题描述
什么是Apache的星火SQLContext和HiveContext之间的区别是什么?
What are the differences between Apache Spark SQLContext and HiveContext ?
有消息说,由于HiveContext是SQLContext开发的超集应始终使用HiveContext具有比SQLContext更多的功能。但是每上下文的当前API的大多是一样的。
Some sources say that since the HiveContext is a superset of SQLContext developers should always use HiveContext which has more features than SQLContext. But the current APIs of each contexts are mostly same.
- 什么是它SQLContext / HiveContext是比较有用的场景?
- 是HiveContext更有益与蜂巢的工作,只有当?
- 抑或是SQLContext是所有在使用Apache星火实现大数据的应用程序需要?
推荐答案
星火2.0 +
星火2.0提供原生窗口函数(),并具有一些在解析额外的改进和更好的SQL 2003遵守所以它是显著较少依赖蜂巢实现核心的funcionality正因为如此 HiveContext
似乎稍微不那么重要。
Spark 2.0 provides native window functions (SPARK-8641) and features some additional improvements in parsing and much better SQL 2003 compliance so it is significantly less dependent on Hive to achieve core funcionality and because of that HiveContext
seems to be slightly less important.
星火< 2.0
显然,如果你想与蜂巢工作,你必须使用 HiveContext
。除此之外,截至目前(星火1.5)最大的区别就是对的并访问蜂巢UDF的能力。
Obviously if you want to work with Hive you have to use HiveContext
. Beyond that the biggest difference as for now (Spark 1.5) is a support for window functions and ability to access Hive UDFs.
一般而言窗口函数是一个pretty凉爽特性,可用于解决一个简洁的方式相当复杂的问题,而不RDDS和DataFrames之间来回。表现依然很不理想尤其是没有 PARTITION BY
条款,但它是真的没有什么具体的星火
Generally speaking window functions are a pretty cool feature and can be used to solve quite complex problems in a concise way without going back and forth between RDDs and DataFrames. Performance is still far from optimal especially without PARTITION BY
clause but it is really nothing Spark specific.
关于蜂房的UDF它不是一个严重的问题了,但星火1.5许多SQL函数已经出厂前$ P $使用蜂巢UDF和要求 HiveContext
工作pssed。
Regarding Hive UDFs it is not a serious issue now, but before Spark 1.5 many SQL functions have been expressed using Hive UDFs and required HiveContext
to work.
HiveContext
还提供了更强大的SQL语法分析程序。例如见:在数据框中选择嵌套列时
HiveContext
also provides more robust SQL parser. See for example: py4j.protocol.Py4JJavaError when selecting nested column in dataframe using select statetment
最后 HiveContext
需要启动节俭服务器。
Finally HiveContext
is required to start Thrift server.
与 HiveContext
最大的问题是,它有大的依赖。
The biggest problem with HiveContext
is that it comes with large dependencies.
这篇关于什么是Apache的星火SQLContext VS HiveContext区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!