so I have a file that looks like this :/translation="MDGVTQQNAALVQEATTAAASLEEQARNLTAAVAAFDLGDKQTV LITPRAAVPALKRPALKASLPASSSHGNWETF" /product="Methyl-accepting chemotaxis protein I (serine chemoreceptor protein)" CDS complement(471..590) /db_xref="SEED:fig|1240086.14.peg.2" /translation="MHQYQSAILAKICRYGGIEKPEITPASVYKLDSHWRYVI" /product="hypothetical protein" CDS 717..2354 /db_xref="SEED:fig|1240086.14.peg.3" /translation="MGFFVVLWGGASGFSLYSLKQVTTLLHDNSTQGRTYTYLVYGND QYFRSVTRMARVMDYSQFSDAAIASLEEQAQQLTKAVEVFHLGSEYQTAAS RTRPAGNMALKRPALSGMAPALPPARTASDEGSWEKF" /product="Methyl-accepting chemotaxis protein I (serine chemoreceptor protein)" /product="macromolecule metabolism; macromolecule degradation; degradation of proteins, peptides, glycopeptides"I need to extract the text that is between quotes after a "/product=", so I need this :Methyl-accepting chemotaxis protein I (serine chemoreceptor protein)hypothetical proteinMethyl-accepting chemotaxis protein I (serine chemoreceptor protein)macromolecule metabolism; macromolecule degradation; degradation of proteins, peptides, glycopeptidesI have to use awk, so I wrote this : awk '/\/product/ {split($0, a, "\""); printf a[2] "\n"}'but this only takes the info on the same line as "/product", and some times the info is on two or three lines.. I'm out of ideas as to how to get the entire info between the quotes, anyone can help? 解决方案 awk to the rescue! needs multi-char RS support (gawk)$ awk -v RS='/| CDS' -F'"' '/^product/{gsub("\n +"," "); print $2}' fileMethyl-accepting chemotaxis protein I (serine chemoreceptor protein)hypothetical proteinMethyl-accepting chemotaxis protein I (serine chemoreceptor protein)macromolecule metabolism; macromolecule degradation; degradation of proteins, peptides, glycopeptidesExplanationset the record structure (either starts with "/" or " CDS", find related records (starting with product), trim extra spaces and print the field between two quotes (second field based on set field delimiter to double quotes).
09-09 00:56