问题描述
我想一个文件分割成视第五个字段的值不同的较小的文件。一个很漂亮的方式做到这一点是和also这里。
不过,我试图将其纳入为一的qsub脚本.SH这一点,但没有成功。
的问题是,在部分,在那里被指定到的文件输出线路,
即, F =Alignments_$ 5.SAM打印> ˚F
,我需要通过早期的脚本,它指定该文件应写入的目录声明的变量。我需要这是每个任务时,我为多个文件发送阵列作业构建一个变量来做到这一点。
所以说, $ output_path
= ./样品1
我需要写类似
F = $ output_path/ Alignments_$ 5.SAM打印> F
但它似乎不喜欢有一个$变量,是不是属于一个AWK $场。我甚至不认为它喜欢前后5 $后有两个字符串。
我回来的错误是,它需要的文件的第一行被分割( little.sam
),并尝试将名称˚F
这样的,其次是/ Alignments_$ 5.SAM(这些最后三个放在一起正确)。它说,当然,这是太大的名称。
我怎么能写这样它的工作原理?
谢谢!
的awk -F'[:\\ t]''#阅读Tile_Number_List号码列表
FNR == {NR
NUM [$ 1]
下一个
} #过程.BAM文件的每一行
#与未知$ 5号将忽略任何行
$ 5 NUM {
F =Alignments_$ 5.SAM打印> F
}'Tile_Number_List.txt little.sam
更新,之后加入-v来awk和声明变量OPATH
输入= $ 1
outputBase = $ {输入%.bam}MKDIR -v $ outputBase \\ _TESTNEWDIR = $ outputBase \\ _TESTsamtools查看-h $输入| AWK'NR> = 18'| awk的-F'[\\ t:]'-v OPATH =$ NEWDIR'FNR == {NR
NUM [$ 1]
下一个
}$ 5 NUM {
F = NEWDIR/定线_$ 5.SAM
打印> F
}'Tile_Number_List.txt - MKDIR:创建的目录little_TEST
AWK:CMD。行:10:(文件名= - FNR = 1)致命的:不能重定向到`/Alignments_1101.sam(权限被拒绝)
要通过shell变量的值,如 $ output_path
到 AWK
您需要使用 -v
选项。
$ output_path = /样品1 /$ awk的-F'[:\\ t]'-v OPATH =$ ouput_path'
#读取数字的Tile_Number_List列表
FNR == {NR
NUM [$ 1]
下一个
} #过程.BAM文件的每一行
#与未知$ 5号将忽略任何行
$ 5 NUM {
F = OPATH路线_$ 5.SAM
打印> F
}'Tile_Number_List.txt little.sam
另外你还有从$p$pvious问题的留在你的脚本
编辑:
与 -v
创建 AWK
变量 obase的
但你用 NEWDIR
你想要的是:
输入= $ 1
outputBase = $ {输入%.bam}
MKDIR -v $ outputBase \\ _TEST
NEWDIR = $ outputBase \\ _TESTsamtools查看-h$输入| awk的-F'[\\ t:]'-v OPATH =$ NEWDIR'
FNR == NR和放大器;&安培; NR> = {18
NUM [$ 1]
下一个
}
$ 5 NUM {
F = OPATH/定线_$ 5.SAM#< - OPATH是awk的变量未NEWDIR
打印> F
}'Tile_Number_List.txt -
您也应该移动 NR> = 18
进入第二 AWK
脚本。
I am trying to split a file into different smaller files depending on the value of the fifth field. A very nice way to do this was already suggested and also here.
However, I am trying to incorporate this into a .sh script for qsub, without much success.
The problem is that in the section where the file to which output the line is specified,
i.e., f = "Alignments_" $5 ".sam" print > f
, I need to pass a variable declared earlier in the script, which specifies the directory where the file should be written. I need to do this with a variable which is built for each task when I send out the array job for multiple files.
So say $output_path
= ./Sample1
I need to write something like
f = $output_path "/Alignments_" $5 ".sam" print > f
But it does not seem to like having a $variable that is not a $field belonging to awk. I don't even think it likes having two "strings" before and after the $5.
The error I get back is that it takes the first line of the file to be split (little.sam
) and tries to name f
like that, followed by /Alignments_" $5 ".sam" (those last three put together correctly). It says, naturally, that it is too big a name.
How can I write this so it works?
Thanks!
awk -F '[:\t]' ' # read the list of numbers in Tile_Number_List
FNR == NR {
num[$1]
next
}
# process each line of the .BAM file
# any lines with an "unknown" $5 will be ignored
$5 in num {
f = "Alignments_" $5 ".sam" print > f
} ' Tile_Number_List.txt little.sam
UPDATE, AFTER ADDING -V TO AWK AND DECLARING THE VARIABLE OPATH
input=$1
outputBase=${input%.bam}
mkdir -v $outputBase\_TEST
newdir=$outputBase\_TEST
samtools view -h $input | awk 'NR >= 18' | awk -F '[\t:]' -v opath="$newdir" '
FNR == NR {
num[$1]
next
}
$5 in num {
f = newdir"/Alignments_"$5".sam";
print > f
} ' Tile_Number_List.txt -
mkdir: created directory little_TEST'
awk: cmd. line:10: (FILENAME=- FNR=1) fatal: can't redirect to `/Alignments_1101.sam' (Permission denied)
To pass the value of the shell variable such as $output_path
to awk
you need to use the -v
option.
$ output_path=./Sample1/
$ awk -F '[:\t]' -v opath="$ouput_path" '
# read the list of numbers in Tile_Number_List
FNR == NR {
num[$1]
next
}
# process each line of the .BAM file
# any lines with an "unknown" $5 will be ignored
$5 in num {
f = opath"Alignments_"$5".sam"
print > f
} ' Tile_Number_List.txt little.sam
Also you still have the error from your previous question left in your script
EDIT:
The awk
variable created with -v
is obase
but you use newdir
what you want is:
input=$1
outputBase=${input%.bam}
mkdir -v $outputBase\_TEST
newdir=$outputBase\_TEST
samtools view -h "$input" | awk -F '[\t:]' -v opath="$newdir" '
FNR == NR && NR >= 18 {
num[$1]
next
}
$5 in num {
f = opath"/Alignments_"$5".sam" # <-- opath is the awk variable not newdir
print > f
}' Tile_Number_List.txt -
You should also move NR >= 18
into the second awk
script.
这篇关于虽然读线时,awk $线,并写入变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!