我想在 hive 表中插入简单的查询;

我使用以下代码create table t(id int, f1 String, f2 int);创建表,并尝试插入insert into t values (1, '123', 1);
任务已创建,但未执行。

Query ID = hadoop_20200518194705_4ec47375-e5e8-4d33-80d8-ed183aacb0c2
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1589556481112_0007, Tracking URL = http://hadoop_hose:8088/proxy/application_1589556481112_0007/
Kill Command = /home/hadoop/hadoop-3.1.3/bin/mapred job  -kill job_1589556481112_0007

我究竟做错了什么?

UPD:
从GUI列出:hadoop - 使用QL插入 hive 未运行-LMLPHP

hadoop - 使用QL插入 hive 未运行-LMLPHP

最佳答案

您应该查看Hive中的ACID和交易。
您可以点击以下链接:

https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions

hive 交易管理器应设置为DbTxnManager

SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

我们需要启用并发
SET hive.support.concurrency=true;

设置上述属性后,我们应该能够将数据插入任何表中。
对于更新和删除,表应该是存储桶,文件格式必须为ORC或任何符合ACID的格式。
我们还需要将表属性事务设置为true
TBLPROPERTIES ('transactional'='true');

举个例子

查看属性
# REVIEW PROPERTIES
$ cd /etc/hive/conf
$ grep -i txn hive-site.xml
$ hive -e "SET;" | grep -i txn
$ beeline -u jdbc:hive2://localhost:10000

SET hive.txn.manager;
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager;

SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

SET hive.support.concurrency=true;

SET hive.enforce.bucketing;
SET hive.enforce.bucketing=true;

SET hive.exec.dynamic.partition.mode;
hive.exec.dynamic.partition.mode=strict

SET hive.exec.dynamic.partition.mode=nonstrict;

SET hive.compactor.initiator.on;
SET hive.compactor.initiator.on=true;
-- A positive number
SET hive.compactor.worker.threads;
SET hive.compactor.worker.threads=1;

CREATE TABLE orders_transactional (
  order_id INT,
  order_date STRING,
  order_customer_id INT,
  order_status STRING
) CLUSTERED BY (order_id) INTO 8 BUCKETS
STORED AS ORC
TBLPROPERTIES("transactional"="true");

INSERT INTO orders_transactional VALUES
(1, '2013-07-25 00:00:00.0', 1000, 'COMPLETE');

INSERT INTO orders_transactional VALUES
(2, '2013-07-25 00:00:00.0', 2001, 'CLOSED'),
(3, '2013-07-25 00:00:00.0', 1500, 'PENDING'),
(4, '2013-07-25 00:00:00.0', 2041, 'PENDING'),
(5, '2013-07-25 00:00:00.0', 2031, 'COMPLETE');

UPDATE orders_transactional
  SET order_status = 'COMPLETE'
WHERE order_status = 'PENDING';

DELETE FROM orders_transactional
WHERE order_status <> 'COMPLETE';

SELECT *
FROM orders_transactional;

关于hadoop - 使用QL插入 hive 未运行,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/61874740/

10-16 16:28