本文介绍了Bash并发作业陷入困境的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经实现了一种在bash中执行并发作业的方法,如所示。



我正在遍历大约13000行的文件。我只是测试并打印每一行,例如:

 #!/ bin / bash 
max_bg_procs() {
if [[$#-eq 0]];然后
echo用法:max_bg_procs NUM_PROCS。将等到背景(&)的数量
echo bash进程(由'jobs -pr'确定)低于NUM_PROCS
返回
fi
local max_number = $(((0 + $ {1:-0})))
为true时;做
local current_number = $(jobs -pr | wc -l)
if [[$ current_number -lt $ max_number]];然后
回声成功,如果,则
中断
fi
回声必须等待
睡眠4
完成
}

download_data(){
echo link# $ 2 [ $ 1]
}

mapfile -t myArray< $ 1

i = 1
$ {myArray [@]}中的网址

max_bg_procs 6
download_data $ url $ i&
((i ++))
完成
echo finito!

我还尝试了其他解决方案,例如和,但我的问题仍然存在:



在给定的随机步骤中,通常在第2000次与第5000次迭代之间,它被卡住了。我在代码中间放置了各种 echo ,以查看卡住的位置,但最后打印出的是 $ url $ i



我已经完成了简单的测试,删除了任何并行性并仅循环了文件内容:一切正常,并且一直循环到结束。



所以这让我觉得我在并行性方面缺少一些限制,而且我想知道是否有人可以帮助我解决这个问题。



非常感谢!

解决方案

在这里,我们最多有6个并行bash进程调用 download_data ,每个调用最多传递16个URL。



请注意,这同时要求bash(用于导出功能支持)和GNU xargs。

pre> #!/ usr / bin / env bash
#^^^^-不是/ bin / sh

download_data(){
echo link#$ 2 [$ 1]#待办事项:将其替换为实际上需要一些时间的工作
}
export -f download_data
< input.txt xargs -d $' \n'-P 6 -n 16-bash -c'表示arg;做download_data $ arg;完成 _ _


I've implemented a way to have concurrent jobs in bash, as seen here.

I'm looping through a file with around 13000 lines. I'm just testing and printing each line, as such:

#!/bin/bash
max_bg_procs(){
    if [[ $# -eq 0 ]] ; then
        echo "Usage: max_bg_procs NUM_PROCS.  Will wait until the number of background (&)"
        echo "           bash processes (as determined by 'jobs -pr') falls below NUM_PROCS"
        return
    fi
    local max_number=$((0 + ${1:-0}))
    while true; do
        local current_number=$(jobs -pr | wc -l)
        if [[ $current_number -lt $max_number ]]; then
                echo "success in if"
                break
        fi
        echo "has to wait"
        sleep 4
    done
}

download_data(){
    echo "link #" $2 "["$1"]"
}

mapfile -t myArray < $1

i=1
for url in "${myArray[@]}"
do
    max_bg_procs 6
    download_data $url $i &
    ((i++))
done
echo "finito!"

I've also tried other solutions such as this and this, but my issue is persistent:

At a "random" given step, usually between the 2000th and the 5000th iteration, it simply gets stuck. I've put those various echo in the middle of the code to see where it would get stuck but it the last thing it prints is the $url $i.

I've done the simple test to remove any parallelism and just loop the file contents: all went fine and it looped till the end.

So it makes me think I'm missing some limitation on the parallelism, and I wonder if anyone could help me out figuring it out.

Many thanks!

解决方案

Here, we have up to 6 parallel bash processes calling download_data, each of which is passed up to 16 URLs per invocation. Adjust per your own tuning.

Note that this expects both bash (for exported function support) and GNU xargs.

#!/usr/bin/env bash
#              ^^^^- not /bin/sh

download_data() {
  echo "link #$2 [$1]" # TODO: replace this with a job that actually takes some time
}
export -f download_data
<input.txt xargs -d $'\n' -P 6 -n 16 -- bash -c 'for arg; do download_data "$arg"; done' _

这篇关于Bash并发作业陷入困境的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-06 03:28