问题描述
关于威胁表模式日志文件的第二个问题.我正在处理位于workdir中的大量dlg文本文件的分析.每个文件都有一个以下格式的表(通常位于日志末尾):
A second question for threatment of log files for table-patterns. I am dealing with the analysis of big number of dlg text files located within the workdir. Each file has a table (usually located in the end of the log) in the following format:
RMSD TABLE
__________
_____________________________________________________________________
| | | | | |
Rank | Sub- | Run | Binding | Cluster | Reference | Grep
| Rank | | Energy | RMSD | RMSD | Pattern
_____|______|______|___________|_________|_________________|___________
1 1 7 -1.43 0.00 178.12 RANKING
1 2 18 -0.96 1.88 177.35 RANKING
2 1 4 -0.97 0.00 178.43 RANKING
3 1 13 -0.60 0.00 178.03 RANKING
4 1 5 -0.56 0.00 198.10 RANKING
5 1 16 +0.01 0.00 189.71 RANKING
6 1 3 +0.06 0.00 176.95 RANKING
7 1 19 +0.10 0.00 177.27 RANKING
8 1 17 +0.13 0.00 177.60 RANKING
9 1 8 +0.20 0.00 177.05 RANKING
10 1 20 +0.27 0.00 177.43 RANKING
11 1 10 +0.34 0.00 176.33 RANKING
12 1 6 +0.37 0.00 177.30 RANKING
13 1 9 +0.44 0.00 175.48 RANKING
14 1 2 +0.46 0.00 175.67 RANKING
15 1 11 +0.84 0.00 177.52 RANKING
15 2 12 +1.31 1.95 178.03 RANKING
16 1 14 +1.29 0.00 201.01 RANKING
17 1 15 +1.65 0.00 175.50 RANKING
18 1 1 +1.96 0.00 186.83 RANKING
Run time 3.909 sec
Idle time 0.817 sec
目标是遍历所有.dlg文件,并从表中获取与其第一行相对应的单行(忽略标题),而忽略最后一列(通常用于grep识别).在上表的示例中,这是第三行.
The aim is to loop over all the .dlg files and take the single line from the table corresponding to its first line (ignorring the header) ommiting the last column (normally provided for grep recognition). In the above example from the table this is the third line.
1 1 7 -1.43 0.00 178.12
然后,我需要将此行与日志文件的名称(应在之前指定)一起添加到final_log.txt中.根据我最近的经验,我的BASH工作流程(针对多个文件的威胁)的可能模型可能是:
Then I need to add this line to the final_log.txt together with the name of the log file (that should be specified before).Based on my very recent experience a possible model for my BASH workflow (for threatment of several files) may be:
#!/bin/bash
#name of the folder containing all *.dlg filles to be analysed
prot='7000'
#path to the folder with these *.dlg filles
FILES=$PWD/${prot}/*.dlg
#make a final log
echo 'This is a list of processed filles' > $PWD/final_results.log
# we loop over all *.dlg filles in order to extract Clustering Histogram to the final LOG file
for f in $FILES
do
file_name2=$(basename "$f")
file_name="${file_name2/.dlg}"
echo "Processing of $f..."
# here is an expression for GREP to take the line from the table and save it to >> $PWD/final_results.log
done
推荐答案
如何开始-假设gawk具有nextfile
支持:
how about to start with - assuming gawk with the nextfile
support:
gawk '$1~/[[:digit:]]/{ print FILENAME, substr($0,1,match($0,/[[:blank:]]+[^[:blank:]]+$/)-1);nextfile}' *.dlg
这篇关于bash从表中提取第一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!