我有两个这样的专栏:
cluster22717 GO:0005737,GO:0007049,GO:0051301
我如何将其转换为:
cluster22717 GO:0005737
cluster22717 GO:0007049
cluster22717 GO:0051301
我还应该提到的是,这是一个文件中的一行,有数千行,第二列有不同数量的元素。
提前谢谢你,
佩兹曼·萨夫达里
最佳答案
最简单的解决方案是使用一些循环,见下面的示例,
输入文件:sample.txt
cluster22717 GO:0005737,GO:0007049,GO:0051301
cluster22717 GO:0005738,GO:0007041,GO:0051304,GO:0051307
cluster22717 GO:0005739,GO:0007042,GO:0051305,GO:0005737,GO:0007046
cluster22717 GO:0005740,GO:0007043,GO:0051306,GO:0005738,GO:0007041,GO:0051304
脚本:
while read line
do
var1=$(echo $line | awk '{print $1}') # assign first field to var1
Arrayvals=($(echo $line | awk '{print $2}' | sed -e 's/,/ /g')) # create an array from second filed
for (( i=0; i < ${#Arrayvals[@]} ; i++ )) # iterate the array using a for loop , ${#Arrayvals[@]} -> gives the length of array
do
echo "${var1} ${Arrayvals[${i}]}" # echo in desired format
done
done < sample.txt
输出:
cluster22717 GO:0005737
cluster22717 GO:0007049
cluster22717 GO:0051301
cluster22717 GO:0005738
cluster22717 GO:0007041
cluster22717 GO:0051304
cluster22717 GO:0051307
cluster22717 GO:0005739
cluster22717 GO:0007042
cluster22717 GO:0051305
cluster22717 GO:0005737
cluster22717 GO:0007046
cluster22717 GO:0005740
cluster22717 GO:0007043
cluster22717 GO:0051306
cluster22717 GO:0005738
cluster22717 GO:0007041
cluster22717 GO:0051304
希望这能有所帮助,