问题描述
我有一些code,大约30000条记录处理。基本轮廓是这样的:
I have some code that processes around 30,000 records. The basic outline is like this:
startRecordID = 2345;
endRecordID = 32345;
for(recordID=startRecordID; recordID <= endRecordID; recordID++){
// process record...
}
现在,这种处理需要很长的时间,我想有15个线程的线程池,并给每个线程recordIDs处理的名单,然后加入他们全部结束。
Now, this processing takes a long time, and I'd like to have a thread pool of 15 threads and give each thread a list of recordIDs to process, and then join them all at the end.
在我与code做到了这一点过去那种看起来是这样的,其中 recordLists
是子数组的数组的记录每片含1/15要被处理
In the past I accomplished this with code that looked something like this, where recordLists
was an array of sub-arrays each containing 1/15 of the records to be processed:
<cfset numThreads = 15 />
<!--- keep a running list of threads so we can join them all at the end --->
<cfset threadlist = "" />
<cfloop from="1" to="#numThreads#" index="threadNum">
<cfset threadName = "recordProcessing_#threadNum#" />
<cfset threadlist = listAppend(threadlist, threadName) />
<cfthread action="run" name="#threadName#" recordList="#recordList[threadNum]#">
<cfloop from="1" to="#ArrayLen(recordList)#" index="recordIndex">
<cfset recordID = recordList[recordIndex] />
... process recordID ...
</cfloop>
</cfthread>
</cfloop>
<!--- Join all threads before continuing --->
<cfthread action="join" name="#threadlist#" timeout="4000"/>
这行之有效(虽然我也想转变这种旧的code到CFSCRIPT :)),而是创造子阵不是这么简单的......我能想到的方式做到这一点的recordLists阵列会遍历从startRecordID-endRecordID的数字,每增加一个数组,然后运行ArrayDivide功能(我们已经在我们的codeBase的已定义)就可以将它分成numThreads(在这种情况下,15)相等子阵列。考虑到我有范围的开始,该范围的结束,和线程我想这之间划分的数目,是不是有一个更简单的,打破它,并把它分配给线程的方式?
This worked well (although I would also convert this old code to cfscript :) ), but to create the recordLists array of sub-arrays is not so simple... The way I can think of to do it would be to loop through the numbers from startRecordID-endRecordID, add each to an array, then run an ArrayDivide function (that we have already defined in our codebase) on it to split it into numThreads (in this case 15) equal sub-arrays. Considering that I have the start of the range, the end of the range, and the number of threads I want to divide it among, isn't there a simpler way to break it up and assign it to the threads?
推荐答案
(自评..)
如果你已经再有一个数组,为什么环通吗?有没有内置的功能,但由于数组是一个java 列表
,一个简单的 yourArray.subList(的startIndex,endIndex的)
一>会做的伎俩。明显增加一些错误的情况下,处理的记录数量少于处理线程的数目。
If you already have an array, why loop through it again? There are no built in functions, but since an array is a java List
, a simple yourArray.subList(startIndex, endIndex)
would do the trick. Obviously add some error handling in case the number of records is less than the number of processing threads.
注:由于它是一个Java方法,指标开始于零(0)和 endIndex的
是独家。此外,其结果是如的一个CF阵列在许多方面。然而,这是不可改变的,即无法修改。
NB: Since it is a java method, indexes start at zero (0) and the endIndex
is exclusive. Also, the result is like a CF array in most respects. However, it is immutable ie cannot be modified.
<cfscript>
// calculate how many records to process in each batch
numOfIterations = 15;
totalRecords = arrayLen(recordsArray);
batchSize = ceiling(totalRecords/numOfIterations);
for (t=0; t < numOfIterations; t++) {
// calculate sub array positions
startAt = t * batchSize;
endAt = Min(startAt+batchSize, totalRecords);
// get next batch of records
subArray = recordsArray.subList(startAt, endAt);
// kick off a thread and do whatever you want with the array ...
WriteOutput("<br>Batch ["& t &"] startAt="& startAt &" endAt="& endAt);
}
</cfscript>
这篇关于池&QUOT;我怎么能之间的&QUOT分裂值的范围内;线程?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!