如何在不删除源文件的情况下将数据从 HDFS 加载到 hive?

本文介绍了如何在不删除源文件的情况下将数据从 HDFS 加载到 hive?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

从 HDFS 加载数据到 Hive 时，使用

When load data from HDFS to Hive, using

LOAD DATA INPATH 'hdfs_file' INTO TABLE tablename;

命令，看起来它正在将 hdfs_file 移动到 hive/warehouse 目录.是否可以(如何?)复制它而不是移动它，以便文件被另一个进程使用.

command, it looks like it is moving the hdfs_file to hive/warehouse dir.Is it possible (How?) to copy it instead of moving it, in order, for the file, to be used by another process.

推荐答案

根据您的问题，我假设您已经在 hdfs 中拥有数据.所以你不需要LOAD DATA，这会将文件移动到默认的配置单元位置/user/hive/warehouse.您可以简单地使用 external 关键字定义表，这会将文件保留在原位，但会在 hive 元存储中创建表定义.看这里:创建表DDL例如:

from your question I assume that you already have your data in hdfs.So you don't need to LOAD DATA, which moves the files to the default hive location /user/hive/warehouse. You can simply define the table using the externalkeyword, which leaves the files in place, but creates the table definition in the hive metastore. See here:Create Table DDLeg.:

create external table table_name (
  id int,
  myfields string
)
location '/my/location/in/hdfs';

请注意，您使用的格式可能与默认格式不同(正如 JigneshRawal 在评论中提到的).您可以使用自己的分隔符，例如在使用 Sqoop 时:

Please note that the format you use might differ from the default (as mentioned by JigneshRawal in the comments). You can use your own delimiter, for example when using Sqoop:

row format delimited fields terminated by ','

这篇关于如何在不删除源文件的情况下将数据从 HDFS 加载到 hive?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！