因为我们有一个带有2列的表,所以我们假设在SQL中
(我们在SQL源表中没有任何created_date,Updated_date,Flag列,并且不修改源表)
id is primary key
id name
1 AAAAA
2 BBBBB
3 CCCCC
4 ADAEAB
5 GGAGAG
我使用sqoop将数据拉入 hive 作为主表也可以
但是如果源数据如下更新
id name
1 ACACA
2 BASBA
3 CCHAH
4 AASDA1
5 GGAGAG
问题:
My Issue is that without effecting the Main table data in hive i need to pull the
Updated or Inserted or Deleted data using Sqoop and
also simultaneously update in the Hive Main Table without effecting the
Existing once....
i have tried tried to use
--incremental .... so on properties but no result....
结果应为:
output main table is having all the 10 records... it should be 5 records....
If we have More Records like millions of Records Then What is the Solution.....
需求:
on day1 i have 1millions of records
on day 2 i have 1million + current day + updated lets say 2 million
on day2 i have to pull only updated and newly inserted data rather than whole data.
and also
can Anyone Help me how to combine day1 hive data with day2 updated data...
In case if Anyone has Any other solution like any Alternative please suggest me
Clearly Because i m new to hadoop....
最佳答案
请引用以下链接:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_dataintegration/content/incrementally-updating-hive-table-with-sqoop-and-ext-table.html
关于sql - 仅将更新的记录从SQL导入到Hive,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/24216430/