本文介绍了为什么有许多火花仓库文件夹被创建?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Ubuntu上安装了hadoop 2.8.1,然后在其上安装了spark-2.2.0-bin-hadoop2.7。我使用了spark-shell并创建了表格。我再次使用直线并创建表格。我观察到有三个不同的文件夹被创建,名为spark-warehouse,如下所示:



spark-2.2.0-bin-hadoop2.7 / spark-warehouse

spark-2.2.0-bin-hadoop2.7 / bin / spark-warehouse

spark-2.2.0-bin-hadoop2.7 / sbin / spark-warehouse



什么是火花仓库,以及为什么会创建多次?
有时我的火花外壳和直线显示不同的数据库和表格,有时它显示相同。我没有看到发生了什么?



此外,我没有安装配置单元,但仍然可以使用直线,并且我可以通过java程序访问数据库。蜂巢是如何在我的机器上出现的?
请帮助我。



以下是我用来通过JDBC连接apache spark的java代码:

  private static String driverName =org.apache.hive.jdbc.HiveDriver; 

public static void main(String [] args)throws SQLException {
try {
Class.forName(driverName);
} catch(ClassNotFoundException e){
// TODO自动生成的catch块
e.printStackTrace();
System.exit(1);

Connection con = DriverManager.getConnection(jdbc:hive2://10.171.0.117:10000 / default,,);
Statement stmt = con.createStatement();


解决方案

除非另行配置,否则Spark将创建一个名为 metastore_db 与 derby.log 。看起来你没有改变。





blockquote>

有时我的火花外壳和直线会显示不同的数据库和表格,有时显示相同的数据和表格

在这些不同的文件夹中重新启动这些命令,所以你看到的只是局限于当前的工作目录。

它没有。您可能会连接到,它与HiveServer2协议,Derby数据库完全兼容,如上所述,或者实际上确实有一个HiveServer2实例坐落在 10.171.0.117



无论如何,这里不需要JDBC连接。您可以直接使用 SparkSession.sql 函数。

I have installed hadoop 2.8.1 on ubuntu and then installed spark-2.2.0-bin-hadoop2.7 on it. I used spark-shell and created the tables. Again I used beeline and created tables. I have observed that there are three different folders got created named spark-warehouse as :

1- spark-2.2.0-bin-hadoop2.7/spark-warehouse

2- spark-2.2.0-bin-hadoop2.7/bin/spark-warehouse

3- spark-2.2.0-bin-hadoop2.7/sbin/spark-warehouse

What is exactly spark-warehouse and why are these created many times?Sometimes my spark shell and beeline shows different databases and tables and sometimes it show same. I am not getting what is happening?

Further, I did not installed hive but still I am able to use beeline and also I can access the databases though java program. How the hive came on my machine?Please help me. I am new to spark and installed it by online tutorials.

Below is the java code I was using to connect apache spark though JDBC:

 private static String driverName = "org.apache.hive.jdbc.HiveDriver";

public static void main(String[] args) throws SQLException {
    try {
        Class.forName(driverName);
    } catch (ClassNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
        System.exit(1);
    }
    Connection con = DriverManager.getConnection("jdbc:hive2://10.171.0.117:10000/default", "", "");
    Statement stmt = con.createStatement();
解决方案

Unless configured otherwise, Spark will create an internal Derby database named metastore_db with a derby.log. Looks like you've not changed that.

This is the default behavior, as point out in the Documentation

You're starting those commands in those different folders, so what you see is only confined to the current working directory.

It didn't. You're probably connecting to the either the Spark Thrift Server, which is fully compatible with HiveServer2 protocol, the Derby database, as mentioned, or, you actually do have a HiveServer2 instance sitting at 10.171.0.117

Anyways, the JDBC connection is not required here. You can use SparkSession.sql function directly.

这篇关于为什么有许多火花仓库文件夹被创建?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-04 21:45