问题描述
我有批处理作业,它从批量文件中读取数据,处理它并插入数据库。
I have batch job which reads data from bulk files, process it and insert in DB.
我正在使用spring的分区功能使用默认的分区处理程序。 / p>
I'm using spring's partitioning features using the default partition handler.
<bean class="org.spr...TaskExecutorPartitionHandler">
<property name="taskExecutor" ref="taskExecutor"/>
<property name="step" ref="readFromFile" />
<property name="gridSize" value="10" />
</bean>
此处 gridSize
的重要性是什么? ?我已经以这样的方式配置它等于taskExecutor中的并发。
What is the significance of the gridSize
here ? I have configured in such a way that it is equal to the concurrency in taskExecutor.
推荐答案
gridSize
指定创建要由(通常)相同数量的工人$ c $处理的
数据块
的数量C>。将其视为map / reduce中的多个映射数据块。
gridSize
specifies the number of data blocks
to create to be processed by (usually) the same number of workers
. Think about it as a number of mapped data blocks in a map/reduce.
使用 StepExecutionSplitter
,给定data, PartitionHandler
分区/将数据拆分为 gridSize
部分,并将每个部分发送给独立的worker =在您的情况下, thread
。
Using a StepExecutionSplitter
, given the data, PartitionHandler
"partitions" / splits the data to a gridSize
parts, and sends each part to an independent worker => thread
in your case.
例如,您需要处理数据库中有10行。如果将 gridSize
设置为 5 ,并且您使用的是简单的分区逻辑,则最终会得到10/5 = 2行每个线程=> 5 线程同时在两行上工作。
For example, you have 10 rows in DB that need to be processed. If you set the gridSize
to be 5, and you are using a straightforward partition logic, you'd end up with 10 / 5 = 2 rows per thread => 5 threads working concurrently on 2 rows each.
这篇关于Spring批处理中的网格大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!