linux - 如何在Shell脚本中并行运行多个实例以提高时间效率

This question already has answers here:

Parallel processing or threading in Shell scripting

(3个答案)

executing shell command in background from script

(4个答案)

去年关闭。

我正在使用Shell脚本，它读取16000行的输入文件。运行脚本需要8个多小时。我需要减少它，所以我将其划分为8个实例并读取数据，在其中我用于循环迭代8个文件，并在while循环中从文件中读取记录。但这是行不通的。
我如何在后台并行运行8个实例
我需要帮助以使其更高效地运行，例如使用函数或派生过程。

这是代码

for file in "$MY_WORK/CCN_split_files"/*
do
    echo "$file"
    echo "begin read loop"
    ### removing the header record from the file ###
    if [ "$file" == "$MY_WORK/CCN_split_files/ccn.email.list.file00" ]
    then
        mv $MY_WORK/CCN_split_files/ccn.email.list.file00 $MY_WORK/raw_file
        sed -e '/ Regular  /d; / Duplicate  /d' $MY_WORK/raw_file > $MY_WORK/CCN_split_files/ccn.email.list.file00
    fi
    ### end of removing header record  ###

    while read -r record
    do
      reccount=$(( reccount + 1 ))

        ### parse input record

          contact_email=`echo "$record" | cut -f5 -d ''`
              echo "contact email is $contact_email"
          credit_card_id=`echo "$record" | cut -f6 -d ''`
              echo "credit card id is $credit_card_id"
          ref_nr=`echo "$record" | cut -f7 -d ''`
              echo "reference nr is $ref_nr"
          cny_cd=`echo "$record" | cut -f8 -d ''`
              echo "country code is $cny_cd"
          lang=`echo "$record" | cut -f9 -d ''`
              echo "language is $lang"
          pmt_ir=`echo "$record" | cut -f13 -d ''`
              echo "payment ir is $pmt_ir"

        ### set paypal or credit card

          if [ "$pmt_ir" = "3" ]
            then
              pmt_typ="PP"
              echo "payment type is $pmt_typ"
          else
              pmt_typ="CC"
              echo "payment type is $pmt_typ"
          fi

        ### retrieve doc from application

          echo "retrieve from CMOD for $ref_nr"
          GetExit01Cntr=0
          GetExit01='F'
          until [[ $GetExit01 = 'T' ]]
           do
            GetExit01Cntr=`expr $GetExit01Cntr + 1`

            /opt/ondemand/bin/arsdoc get -ac -d $MY_WORK -h $host -u $user -p $pwd -v -i  "WHERE ReferenceNumber='$ref_nr' AND CreditCardId='$credit_card_id'" -f "$folder" -L1 -o "$notify_afp" -v 2> $MY_WORK/$arsdoc_out
            if grep "Retrieving 1 document(s)." $MY_WORK/$arsdoc_out > /dev/null
            then
               GetExit01='T'
               echo "CCN AFP retrieval successful"
            else
               echo "CCN AFP retrieval failed - Performing retry (${GetExit01Cntr})"
               sleep 30
               GetExit01='F'
               if [[ $GetExit01Cntr -ge 3 ]]
               then
                  echo "Max Retry Failure: (GetExit01) - Failed to successfully perform arsdoc get"
                  echo "CCN AFP retrieval failed"
                  echo "CCN AFP retrieval failed" >> $MY_WORK/$logfile
                  exit 12
               fi
            fi
           done

        ### convert to PDF

          echo "afp2pdf conversion begins"

          /a585/app/AFP2PDF_PLUS/afp2pdf.sh -i /a585/app/AFP2PDF_PLUS/a2pxopts2.cfg -n /a585/app/AFP2PDF_PLUS/font -o $MY_WORK/$notify_pdf $MY_WORK/$notify_afp > $MY_WORK/$afp2pdf_out 2>&1

          ReturnCode=`echo $?`
          if [ "$ReturnCode" != "0" ]
            then
             echo "afp2pdf failed"
             echo "afp2pdf failed" >> $MY_WORK/$logfile
             exit 12
          fi

        ### assign message text, subject, and reply address variables

          echo "assign message text, subject, reply"
          if [ $cny_cd = "US" ] && [ $lang = "EN" ] && [ $pmt_typ = "CC" ]
            then
               email_text=$MSG_PATH/ccnotifyusen.new
               email_reply="[email protected]"
               email_subject=" Credit Card Billing Adjustment. Ref# $ref_nr"

             elif [ $cny_cd = "CA" ] && [ $lang = "EN" ] && [ $pmt_typ = "CC" ]
               then
                 email_text=$MSG_PATH/ccnotifycaen.new
                 email_reply="[email protected]"
                 email_subject="Credit Card Billing Adjustment. Ref# $ref_nr"

             elif [ $cny_cd = "CA" ] && [ $lang = "FR" ] && [ $pmt_typ = "CC" ]
               then
                 email_text=$MSG_PATH/ccnotifycafr.new
                 email_reply="[email protected]"
                 email_subject=" Rajustement des frais. Ref. $ref_nr"

             elif [ $cny_cd = "US" ] && [ $lang = "EN" ] && [ $pmt_typ = "PP" ]
               then
                 email_text=$MSG_PATH/ppnotifyusen.new
                 email_reply="[email protected]"
                 email_subject=" Billing Adjustment. Ref# $ref_nr"

             elif [ $cny_cd = "CA" ] && [ $lang = "EN" ] && [ $pmt_typ = "PP" ]
               then
                 email_text=$MSG_PATH/ppnotifycaen.new
                 email_reply="[email protected]"
                 email_subject=" Billing Adjustment. Ref# $ref_nr"

             elif [ $cny_cd = "CA" ] && [ $lang = "FR" ] && [ $pmt_typ = "PP" ]
               then
                 email_text=$MSG_PATH/ppnotifycafr.new
                 email_reply="[email protected]"
                 email_subject_text=`cat $MSG_PATH/ppsubjectcafr`
                 email_subject="$email_subject_text $ref_nr"

             else
               echo "invalid country, language, payment type combination: $cny_cd, $lang, $pmt_typ"
               echo "invalid country, language, payment type combination: $cny_cd, $lang, $pmt_typ" >> $MY_WORK/$logfile
               exit 12
          fi

        ### overlay reply address in .muttrc initialization file

          cd /a585/app/script/
          echo "email via NSGalinaMail"

          /usr/bin/java -jar NSGalinaMail.jar "$email_text"  "$email_subject" "$contact_email" "[email protected]" $lang  $cny_cd $MY_WORK/$notify_pdf
          if [ $? -eq 0 ]; then
              emailCountSuccess[$reccount-1]="Success: Email to $contact_email for $ref_nr"
           else
              emailCountFailure[$reccount-1]="Failure: Email to $contact_email for $ref_nr"
           fi

    done < $file
done

最佳答案

如果您想并行完成很多工作，请考虑使用 GNU Parallel 。有一个很棒的PDF here解释了如何使用它。具体来说，我使用的是“第9节-管道模式”来回答您的问题。

我并没有为您重新编写所有代码，只是向您展示了一些想法。

让我们生成一个包含16,000行的示例文件来匹配您的文件:

seq 16000 > YourFile

现在，让我们生成一个称为YourScript的虚拟脚本来处理您的数据，如下所示:

#!/bin/bash
lines=$(wc -l < /dev/stdin)
echo "Called to process $lines lines"
sleep 2

如您所见，它只计算它在stdin上收到的行，并告诉您有多少行并且睡眠2秒钟，以便您可以看到发生了什么。使用以下命令使其可执行:

chmod +x YourScript

现在，您可以使用 GNU Parallel 。首先，让 GNU Parallel 将文件拆分为4,000行的块，并将一个块传递给4个作业中的每一个:

parallel --pipe -N4000 ./YourScript  < YourFile

Called to process     4000 lines
Called to process     4000 lines
Called to process     4000 lines
Called to process     4000 lines

如果您有4个或更多CPU内核，则将花费2秒钟，因为默认情况下， GNU Parallel 每个CPU内核启动一个作业。

现在尝试将2,000行传递给每个作业，并一次运行4个作业:

parallel --pipe -j 4 -N2000 ./YourScript  < YourFile

Called to process     2000 lines
Called to process     2000 lines
Called to process     2000 lines
Called to process     2000 lines
Called to process     2000 lines
Called to process     2000 lines
Called to process     2000 lines
Called to process     2000 lines

这将在2s内运行前4批2,000条线，然后再在2s内运行第二4批2,000条线。

希望您现在可以看到如何并行化脚本。记住是从stdin而不是文件读取的!如果您希望脚本使用16,000行文件的文件名作为参数运行，或者将该文件的一部分文件名作为 GNU Parallel 分段运行，则可以使用:

parallel --pipe -N 2000 --cat YourScript {}

然后它将编写一个包含2,000行的临时文件，调用您的脚本，然后删除该临时文件。

到的有用开关GNU Parallel 是:

parallel --dry-run ...告诉您在不实际执行任何操作的情况下会做什么

parallel --bar ...为您提供进度条

parallel --eta ...可为您提供ETA

还要注意， GNU Parallel 可以在网络中的其他计算机之间分配工作，并且失败并重试处理，输出标记等。

另外，您为16,000行文件的每一行运行cut 6次-这意味着您必须派生近100,000个进程!您可以使用IFS和read代替这6个过程:

IFS='|' read -r f1 f2 f3 <<< "a|b|c"

关于linux - 如何在Shell脚本中并行运行多个实例以提高时间效率，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/59945327/