问题描述
我有一个使用骆驼路线进行一些基本ETL的应用程序.每条路由都配置为从一个表中获取一些数据,进行一些转换,然后将其安全地存储到不同架构的同一表中.因此,骆驼路线和桌子之间存在一对一的关系.
I have an application which does some basic ETL using camel routes. Each route is configured to take some data from one table do some transformation and safe it into the same table on a different schema.So there is an one to one relationship between a camel route and a table.
说我有两条路线:
from("direct:table_1").routeId(table1Route)
.setBody("SELECT * FROM table_1)
.to("jdbc:source_schema").split(body()).streaming()
.process("someProcessor")
.to("sql:INSERT INTO table_1 ... ?dataSource=target_schema");
from("direct:table_2").routeId(table2Route)
.setBody("SELECT * FROM table_2)
.to("jdbc:source_schema").split(body()).streaming()
.process("someProcessor")
.to("sql:INSERT INTO table_2 ... ?dataSource=target_schema");
一切正常,并且在向direct:table_1
和direct:table_2
端点发送start processing
消息时,数据都被移动到目标架构中.
Everything runs OK and the data is moved into target schema when sending a start processing
message to both direct:table_1
and direct:table_2
end points.
但是查看日志,我只能看到表2记录完成后才开始移动表2记录.对于我的应用程序来说,这绝对是不可以,因为有些表非常大,并且一次移动一个表将需要很长时间才能运行.
However looking at the logs I can see table 2 records start being moved only after table 1 records are finished. That is definitely a no no for my application as some tables are quite big and and moving one table at a time would take a very long time to run.
我的问题是我在做错什么,如何解决这个问题,以便数据移动并行发生.
My question is what I am doing wrong and how can I address this so the data movement happens in parallel.
推荐答案
我会尝试这样的事情:
from("start").multicast().parallelProcessing().to("seda:table1", "seda:table2");
基本上我有:
- 使用多播发送到多个收件人,并使用并行处理尝试并行发送到两个端点.
- 我已将您的直接端点替换为seda端点.如果您不需要同步端点,则最好使用seda.
您还可以尝试使用.threads()
语法进行多线程处理.
You can also experiment with .threads()
syntax for multithreading.
如果要在运行时计算表端点,可以将.multicast()
替换为.recipientlist()
If you want to compute your table endpoints at runtime you can replace .multicast()
with .recipientlist()
这篇关于使骆驼路线平行运行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!