本文介绍了Perl比bash快吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个bash脚本,删除了两个时间戳之间的日志文件的一部分,但由于文件的大小,它需要一段时间才能运行。



如果我要在Perl中重写脚本,我可以实现显着的速度增加 - 或者我会移动到像C这样的东西来完成这个?

 #!/ bin / bash 

if [$#-ne 3];那么
回显USAGE $ 0< logfile(s)>< from date(epoch)>< to date(epoch)>
exit 1
fi

LOGFILES = $ 1
FROM = $ 2
TO = $ 3
rm -f / tmp / getlogs ??? ???
TEMP =`mktemp / tmp / getlogsXXXXXX`

##需要上市的日志CHRONOLOGICALLY
ls -lnt $ LOGFILES | awk'{print $ 8}'> $ TEMP
LOGFILES =`tac $ TEMP`
cp / dev / null $ TEMP

findEntry(){
RETURN = 0
dt = $ 1
fil = $ 2
ln1 = $ 3
ln2 = $ 4
t1 =`tail -n + $ ln1 $ fil | head -n1 | cut -c1-15`
dt1 =`date -d$ t1+%s`
t2 =`tail -n + $ ln2 $ fil | head -n1 | cut -c1-15`
dt2 =`date -d $ t2+%s`
if [$ dt -ge $ dt2];那么
mid = $ dt2
else
mid = $((($ ln2- $ ln1)*($ dt- $ dt1)/($ dt2- $ dt1))+ $ ln1))
fi
t3 =`tail -n + $ mid $ fil | head -n1 | cut -c1-15`
dt3 =`date -d$ t3+%s `
#finished
if [$ dt -eq $ dt3];然后
#FOUND IT(滚动回第一个匹配)
while [$ dt -eq $ dt3]; do
mid = $(($ mid-1))
t3 =`tail -n + $ mid $ fil | head -n1 | cut -c1-15`
dt3 =`date - d$ t3+%s`
done
RETURN = $(($ mid + 1))
return
fi
if [$ -1))-eq $ ln1] || [$(($ ln2-1))-eq $ mid];那么
#FOUND NEAR IT
RETURN = $ mid
return
fi
#尚未完成
如果[$ dt -lt $ dt3];那么
#太高
findEntry $ dt $ fil $ ln1 $ mid
else
if [$ dt -ge $ dt3];然后
#太低
findEntry $ dt $ fil $ mid $ ln2
fi
fi
}

#检查日志文件上的时间戳
LOGS =
用于$ LOGFILES中的LOG; do
filetime =`ls -ln $ LOG | awk'{print $ 6,$ 7}'`
timestamp =`date -d$ filetime+%s`
if [$ timestamp -ge $ FROM];然后
LOGS =$ LOGS $ LOG
fi
done

#检查LOGS中的第一个和最后一个日期以进一步优化
$ LOGS; do
if [$ {LOG%.gz}!= $ LOG];然后
gunzip -c $ LOG> $ TEMP
else
cp $ LOG $ TEMP
fi
t =`head -n1 $ TEMP | cut -c1-15`
FIRST =`date -d $ t+%s`
t =`tail -n1 $ TEMP | cut -c1-15`
LAST =`date -d$ t+%s`
if [$ TO -lt $ FIRST] || [$ FROM -gt $ LAST];那么
#这个文件完全超出范围
cp / dev / null $ TEMP
else
if [$ FROM -le $ FIRST];那么
if [$ TO -ge $ LAST];那么
#整个文件在范围内
cat $ TEMP
else
#文件的最后部分超出范围
STARTLINENUMBER = 1
ENDLINENUMBER =` wc -l< $ TEMP`
findEntry $ TO $ TEMP $ STARTLINENUMBER $ ENDLINENUMBER
head -n $ RETURN $ TEMP
fi
else
如果[$ TO - ge $ LAST];那么
#文件的第一部分超出范围
STARTLINENUMBER = 1
ENDLINENUMBER =`wc -l< $ TEMP`
findEntry $ FROM $ TEMP $ STARTLINENUMBER $ ENDLINENUMBER
tail -n + $ RETURN $ TEMP
else
#范围完全在此日志文件内
STARTLINENUMBER = 1
ENDLINENUMBER =`wc -l< $ TEMP`
findEntry $ FROM $ TEMP $ STARTLINENUMBER $ ENDLINENUMBER
n1 = $ RETURN
findEntry $ TO $ TEMP $ STARTLINENUMBER $ ENDLINENUMBER
n2 = $ RETURN
tail -n + $ n1 $ TEMP | head -n $(($ n2- $ n1))
fi
fi
fi
done
rm -f / tmp / getlogs
解决方案

 #!/ 

usr / bin / perl

use strict;
使用警告;

my%months =(
jan => 1,feb => 2,mar => 3,apr => 4,
may => 5 ,jun => 6,jul => 7,aug => 8,
sep => 9,oct => 10,nov => 11,dec => 12,
);

while(my $ line =<>){
my $ ts = substr $ line,0,15;
next if parse_date($ ts)lt'0201100543';
last if parse_date($ ts)gt'0715123456';
print $ line;
}

sub parse_date {
my($ month,$ day,$ time)= split'',$ _ [0]
my($ hour,$ min,$ sec)= split /:/,$ time;
return sprintf(
'%2.2d%2.2d%2.2d%2.2d%2.2d',
$ months {lc $ month},$ day,
$ hour ,$ min,$ sec,
);
}


__END__

上一个回答参考:文件的格式是什么?这里有一个简短的脚本,假设第一列是一个时间戳,并且只打印在某个范围内有时间戳的行。它还假设时间戳已排序。在我的系统上,花费了大约一秒钟来过滤一百万行中的900,000行:

 #!/ usr / bin / perl 

use strict;
使用警告;

while(<>){
my($ ts)= split;
next if $ ts< 1247672719;
last if $ ts> 1252172093;
print $ ts,\\\
;
}

__END__


I have a bash script that cuts out a section of a logfile between 2 timestamps, but because of the size of the files, it takes quite a while to run.

If I were to rewrite the script in Perl, could I achieve a significant speed increase - or would I have to move to something like C to accomplish this?

#!/bin/bash

if [ $# -ne 3 ]; then
  echo "USAGE $0 <logfile(s)> <from date (epoch)> <to date (epoch)>"
  exit 1
fi

LOGFILES=$1
FROM=$2
TO=$3
rm -f /tmp/getlogs??????
TEMP=`mktemp /tmp/getlogsXXXXXX`

## LOGS NEED TO BE LISTED CHRONOLOGICALLY
ls -lnt $LOGFILES|awk '{print $8}' > $TEMP
LOGFILES=`tac $TEMP`
cp /dev/null $TEMP

findEntry() {
  RETURN=0
  dt=$1
  fil=$2
  ln1=$3
  ln2=$4
  t1=`tail -n+$ln1 $fil|head -n1|cut -c1-15`
  dt1=`date -d "$t1" +%s`
  t2=`tail -n+$ln2 $fil|head -n1|cut -c1-15`
  dt2=`date -d "$t2" +%s`
  if [ $dt -ge $dt2 ]; then
    mid=$dt2
  else
    mid=$(( (($ln2-$ln1)*($dt-$dt1)/($dt2-$dt1))+$ln1 ))
  fi
  t3=`tail -n+$mid $fil|head -n1|cut -c1-15`
  dt3=`date -d "$t3" +%s`
  # finished
  if [ $dt -eq $dt3 ]; then
    # FOUND IT (scroll back to the first match)
    while [ $dt -eq $dt3 ]; do
      mid=$(( $mid-1 ))
      t3=`tail -n+$mid $fil|head -n1|cut -c1-15`
      dt3=`date -d "$t3" +%s`
    done
    RETURN=$(( $mid+1 ))
    return
  fi
  if [ $(( $mid-1 )) -eq $ln1 ] || [ $(( $ln2-1)) -eq $mid ]; then
    # FOUND NEAR IT
    RETURN=$mid
    return
  fi
  # not finished yet
  if [ $dt -lt $dt3 ]; then
    # too high
    findEntry $dt $fil $ln1 $mid
  else
    if [ $dt -ge $dt3 ]; then
      # too low
      findEntry $dt $fil $mid $ln2
    fi
  fi
}

# Check timestamps on logfiles
LOGS=""
for LOG in $LOGFILES; do
  filetime=`ls -ln $LOG|awk '{print $6,$7}'`
  timestamp=`date -d "$filetime" +%s`
  if [ $timestamp -ge $FROM ]; then
    LOGS="$LOGS $LOG"
  fi
done

# Check first and last dates in LOGS to refine further
for LOG in $LOGS; do
    if [ ${LOG%.gz} != $LOG ]; then
      gunzip -c $LOG > $TEMP
    else
      cp $LOG $TEMP
    fi
    t=`head -n1 $TEMP|cut -c1-15`
    FIRST=`date -d "$t" +%s`
    t=`tail -n1 $TEMP|cut -c1-15`
    LAST=`date -d "$t" +%s`
    if [ $TO -lt $FIRST ] || [ $FROM -gt $LAST ]; then
      # This file is entirely out of range
      cp /dev/null $TEMP
    else
      if [ $FROM -le $FIRST ]; then
        if [ $TO -ge $LAST ]; then
          # Entire file is within range
          cat $TEMP
        else
          # Last part of file is out of range
          STARTLINENUMBER=1
          ENDLINENUMBER=`wc -l<$TEMP`
          findEntry $TO $TEMP $STARTLINENUMBER $ENDLINENUMBER
          head -n$RETURN $TEMP
        fi
      else
        if [ $TO -ge $LAST ]; then
          # First part of file is out of range
          STARTLINENUMBER=1
          ENDLINENUMBER=`wc -l<$TEMP`
          findEntry $FROM $TEMP $STARTLINENUMBER $ENDLINENUMBER
          tail -n+$RETURN $TEMP
        else
          # range is entirely within this logfile
          STARTLINENUMBER=1
          ENDLINENUMBER=`wc -l<$TEMP`
          findEntry $FROM $TEMP $STARTLINENUMBER $ENDLINENUMBER
          n1=$RETURN
          findEntry $TO $TEMP $STARTLINENUMBER $ENDLINENUMBER
          n2=$RETURN
          tail -n+$n1 $TEMP|head -n$(( $n2-$n1 ))
        fi
      fi
    fi
done
rm -f /tmp/getlogs??????
解决方案

Updated script based on Brent's comment: This one is untested.

#!/usr/bin/perl

use strict;
use warnings;

my %months = (
    jan => 1, feb => 2,  mar => 3,  apr => 4,
    may => 5, jun => 6,  jul => 7,  aug => 8,
    sep => 9, oct => 10, nov => 11, dec => 12,
);

while ( my $line = <> ) {
    my $ts = substr $line, 0, 15;
    next if parse_date($ts) lt '0201100543';
    last if parse_date($ts) gt '0715123456';
    print $line;
}

sub parse_date {
    my ($month, $day, $time) = split ' ', $_[0];
    my ($hour, $min, $sec) = split /:/, $time;
    return sprintf(
        '%2.2d%2.2d%2.2d%2.2d%2.2d',
        $months{lc $month}, $day,
        $hour, $min, $sec,
    );
}


__END__

Previous answer for reference: What is the format of the file? Here is a short script which assumes the first column is a timestamp and prints only lines that have timestamps in a certain range. It also assumes that the timestamps are sorted. On my system, it took about a second to filter 900,000 lines out of a million:

#!/usr/bin/perl

use strict;
use warnings;

while ( <> ) {
    my ($ts) = split;
    next if $ts < 1247672719;
    last if $ts > 1252172093;
    print $ts, "\n";
}

__END__

这篇关于Perl比bash快吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-21 08:14