我必须创建一个awk脚本来实现以下转换:
列顺序是随机的
没有固定的结构。实际上这是个大问题
必须将FLAG列拆分为FLAG1FLAG2
FLAG1FLAG2在下列条件下填写:

if the VAL is ":" then NUM is null
if the VAL is ":" and FLAG "c" then NUM is null and FLAG1 is "c"
if the VAL is ":" and FLAG "u" then NUM is null and FLAG2 is "u"
if the VAL is "14,385" and FLAG "d" then NUM is "14385" and FLAG(both) is null
if the VAL is "14,385" and FLAG "du" then NUM is "14385" and FLAG2 is "u"
if the VAL is ":" and FLAG "cd" then NUM is null and FLAG1 is "c"
if the VAL is ":" and FLAG "bc" then NUM is null and FLAG1 is "c" and FLAG2 is "b"
if the VAL is ":" and FLAG "z" then NUM is 0 and FLAG2 is "z"

csv输入文件是:
"PRIM",  "TRD",   "GTR",   "VAL",   "FLAG"
"TPP",   "T5-78", "HT",    ":",   c
"TCP",   "T5-78", "HT",    "12,385",  c
"TZP",   "T5-78", "HT",    ":",   z
"TNP",   "T5-78", "HT",    ":",   z
"TNP",   "T5-78", "HT",    ":",   cd
"TNP",   "T5-78", "HT",    ":",   du
"TNP",   "T5-78", "HT",    "12,524,652",  dfg

输出.dat文件应如下所示:
PRIM    TRD GTR NUM FLAG1   FLAG2
TPP T5-78   HT  null    c   null
TCP T5-78   HT  12385   c   null
TZP T5-78   HT  0   null    z
TNP T5-78   HT  0   null    z
TNP T5-78   HT  null    c   null
TNP T5-78   HT  null    null    u
TNP T5-78   HT  12524652    null    dfg

我尝试过的代码不能正常工作,因为只有前3个需求得到满足,而第4个需求不能正常工作。
BEGIN {
      FS=","; OFS="\t";
      a["PRIM"]=1;a["TRD"]=1;a["GTR"]=1;a["VAL"]=1;a["FLAG"]=1;
    }
    NR==1 {

   { $a["VAL"] = "NUMB" ; $a["FLAG"] = "FLAG1" ; $5 = "FLAG2" ; print ; next }
    $a["VAL"]=="12,385" && $a["FLAG"] == "d"  { $a["VAL"] = "14385" ; $a["FLAG"] = $5 = "" }
    $a["VAL"]=="12,385" && $a["FLAG"] == "du" { $a["VAL"] = "14385" ; $a["FLAG"] = "" ; $9 = "u" }
    $a["VAL"] != ":" { print ; next }
    $a["FLAG"] == "z" { $a["VAL"] = "0" ; $a["FLAG"] = "" ; $5 = "z" }
     $a["FLAG"] != "z" { $a["VAL"] = "" }

        $NF=substr($NF,1,length($NF)-1);
        for(i=1;i<=NF;i++) if($i in a) a[$i]=i;
    }
    {   print $a["PRIM"],$a["TRD"],$a["GTR"],NR==1?"NUM":$a["VAL"],
        NR==1?"FLAG1"OFS"FLAG2":($a["FLAG"]?""OFS$a["FLAG"]:$a["FLAG"]);

这是最新的代码,我认为它会起作用。现在我无法解决的问题是最后一个值(FLAG2)打印在第二行。我试着放OFS但它不能解决问题。你能告诉我这件事出了什么问题吗。
BEGIN {
FS=",";
OFS="\t";
a["PRIM"]=1;
a["TRD"]=1;
a["GTR"]=1;
a["VAL"]=1;
a["FLAG"]=1;
a["FLAG1"]=1;
a["FLAG2"]=1;
}

NR==1 {
    $NF=substr($NF,1,length($NF)-1);
    for(i=1;i<=NF;i++)
#if($i in a)
a[$i]=i;

a["FLAG1"] = i;
a["FLAG2"]=i;
a["FLAG1"] = a["FLAG"];  # just for testing and it is ok
a["FLAG2"] = a["FLAG"];  # just for testing and it is ok

}

{

print $a["PRIM"],$a["TRD"],$a["GTR"],NR==1?"NUM":$a["VAL"],
    NR==1?"FLAG1":$a["FLAG1"],NR==1?"FLAG2":$a["FLAG2"];

}
结果是
PRIM    TRD GTR NUM FLAG1   FLAG2
TPP T5-78   HT  null    c
   null
TCP T5-78   HT  12385   c
   null
TZP T5-78   HT  0   null
    z

经过这么多的建议,这是我的最后一个版本,但仍然没有成功。。。现在,当我添加if语句来满足上面的需求时,什么都不会发生。我认为if语句要么不正确,要么放在正确的位置。
如果NR>1是灾难,则打印值。。
你能告诉我我的剧本怎么了吗?我不得不承认我三天前就开始写这个剧本了,到目前为止很痛苦…问题是我应该从上周开始就完成这个剧本
BEGIN {
FS=",";
OFS="\t";

a["PRIM"]=1;
a["TRD"]=1;
a["GTR"]=1;
a["VAL"]=1;
a["FLAG"]=1;
a["FLAG1"]=1;
a["FLAG2"]=1;
}

NR==1 {

$NF=substr($NF,1,length($NF)-1);
    for(i=1;i<=NF;i++)
#if($i in a)
a[$i]=i;

#a["FLAG1"] = a[i];
#a["FLAG2"]=a[i];

a["FLAG1"] = a["FLAG"];
a["FLAG2"] = a["FLAG"];
}

{
#initialisation of the new flags
a["FLAG1"]=="";
a["FLAG2"]=="";
}

#MY IF STATEMENTS GO HERE   - TEST MODE

a["FLAG"] == "cd"   {a["FLAG1"]= "c"}
a["FLAG"] == "du"   {a["FLAG2"]= "u"}

{
#print header
print $a["PRIM"],$a["TRD"],$a["GTR"],NR==1?"NUM":$a["VAL"], NR==1?"FLAG1":$a["FLAG1"],NR==1?"FLAG2":$a["FLAG2"];
}

#print content
NR>1
{
    for(j=1;j<=NF;j++)
#if($i in a)
a[$j]=j;

#a["FLAG1"] = a[i];
#a["FLAG2"]=a[i];

a["FLAG1"] = a["FLAG"];
a["FLAG2"] = a["FLAG"];
}
#MY IF STATEMENTS GO HERE   - TEST MODE

a["FLAG"] == "cd"   {a["FLAG1"]= "c"}
a["FLAG"] == "du"   {a["FLAG2"]= "u"}

{
print $a["PRIM"],$a["TRD"],$a["GTR"],$a["VAL"], $a["FLAG1"], $a["FLAG2"]
}

最佳答案

这要求所有输入字段都有双引号。

$ echo '"PRIM",  "TRD",   "GTR",   "VAL",   "FLAG"
"TPP",   "T5-78", "HT",    ":",   "c"
"TCP",   "T5-78", "HT",    "12,385",  "c"
"TZP",   "T5-78", "HT",    ":",   "z"
"TNP",   "T5-78", "HT",    ":",   "z"
"TNP",   "T5-78", "HT",    ":",   "cd"
"TNP",   "T5-78", "HT",    ":",   "du"
"TNP",   "T5-78", "HT",    "12,524,652",  "dfg"' |
awk -F '",[ \t]*"' '
    { sub(/^"/, "", $1); sub(/"$/, "", $NF)}
    NR == 1 {
        for (i=1; i<=NF; i++) col[$i] = i
        print "PRIM TRD GTR NUM FLAG1 FLAG2"
        next
    }
    {
        f = $col["FLAG"]
        v = $col["VAL"]; gsub(/,/, "", v)
        num = "null"; flag1 = "null"; flag2 = "null"
    }
    v == ":"      &&  f == "c"   {flag1 = "c"}
    v == ":"      &&  f == "u"   {flag2 = "u"}
    v == "14385"  &&  f == "d"   {num = $4}
    v == "14385"  &&  f == "du"  {num = $4; flag2 = "u"}
    v == ":"      &&  f == "cd"  {flag1 = "c"}
    v == ":"      &&  f == "bc"  {flag1 = "c"; flag2 = "b"}
    v == ":"      &&  f == "z"   {num = 0; flag2 = "z"}
    {print $col["PRIM"],$col["TRD"],$col["GTR"],num,flag1,flag2}
'

PRIM TRD GTR NUM FLAG1 FLAG2
TPP T5-78 HT null c null
TCP T5-78 HT null null null
TZP T5-78 HT null null z
TNP T5-78 HT null null z
TNP T5-78 HT null c null
TNP T5-78 HT null null null
TNP T5-78 HT null null null

我的输出和你的不一样。检查您的规范,并确保样本输入足以涵盖它们。

10-01 16:40