我有两个这样的专栏:

cluster22717    GO:0005737,GO:0007049,GO:0051301

我如何将其转换为:
cluster22717    GO:0005737
cluster22717    GO:0007049
cluster22717    GO:0051301

我还应该提到的是,这是一个文件中的一行,有数千行,第二列有不同数量的元素。
提前谢谢你,
佩兹曼·萨夫达里

最佳答案

最简单的解决方案是使用一些循环,见下面的示例,
输入文件:sample.txt

cluster22717    GO:0005737,GO:0007049,GO:0051301
cluster22717    GO:0005738,GO:0007041,GO:0051304,GO:0051307
cluster22717    GO:0005739,GO:0007042,GO:0051305,GO:0005737,GO:0007046
cluster22717    GO:0005740,GO:0007043,GO:0051306,GO:0005738,GO:0007041,GO:0051304

脚本:
while read line
do
    var1=$(echo $line | awk '{print $1}')                           # assign first field to var1
    Arrayvals=($(echo $line | awk '{print $2}' | sed -e 's/,/ /g')) # create an array from second filed

    for (( i=0; i < ${#Arrayvals[@]} ; i++ ))  # iterate the array using a for loop , ${#Arrayvals[@]} -> gives the length of array
    do
        echo "${var1}    ${Arrayvals[${i}]}"   # echo in desired format
    done

done < sample.txt

输出:
cluster22717   GO:0005737
cluster22717   GO:0007049
cluster22717   GO:0051301
cluster22717   GO:0005738
cluster22717   GO:0007041
cluster22717   GO:0051304
cluster22717   GO:0051307
cluster22717   GO:0005739
cluster22717   GO:0007042
cluster22717   GO:0051305
cluster22717   GO:0005737
cluster22717   GO:0007046
cluster22717   GO:0005740
cluster22717   GO:0007043
cluster22717   GO:0051306
cluster22717   GO:0005738
cluster22717   GO:0007041
cluster22717   GO:0051304

希望这能有所帮助,

07-24 20:23