将排除数组添加到现有的awk代码

本文介绍了将排除数组添加到现有的awk代码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

给出:

awk '
BEGIN{
  num=split("a the to at in on with and but or",array," ")
  for(i=1;i<=num;i++){
    smallLetters[array[i]]
  }
}
/TITLE/{
  for(i=2;i<=NF;i++){
    if(tolower($i) in smallLetters){
      $i=tolower(substr($i,1,1)) substr($i,2)
    }
    else{
      if($i~/^\"/){
        $i=substr($i,1,1) toupper(substr($i,2,1)) substr($i,3)
      }
      else{
        $i=toupper(substr($i,1,1)) substr($i,2)
      }
    }
  }
}
1
'  Input_file

当文件与某些文本(在本例中为 TITLE )匹配时，该代码正确地将文件的行大写.想法是使用它来修改某些提示表文件并按照以下三个基本规则正确将它们大写:

This code properly capitalice the lines of a file when it matches some text, in this case TITLE. The idea is to use it to modify some cue sheet files and properly capitalice them following three basic rules:

除以下以外的所有单词均大写:
将所有文章(a，the)，介词(到，在，在，在，与)和协调连词(和，但是，或)小写
将标题中的第一个和最后一个单词大写，而不考虑词性的不同

好吧，我想修改awk代码，添加带有要排除的单词列表的第二个数组，并始终将它们写在矩阵中.

Well, I would like to modify the awk code, to add a second array with a list of words to exclude, and always write them as they're written in the matrix.

这对于诸如McCartney，feat.，vs.，CD，USA，NYC等词非常有用.因为，如果没有此排除数组，它们将被更改为:Mccartney，Feat.，Cd，Usa，Nyc等如相关问题中所述，即使这些单词是TITLE的第一个和最后一个单词，也应该排除在外.

This would be very useful for words like: McCartney, feat., vs., CD, USA, NYC, etc. Because, without this exclusion array, they would be changed to: Mccartney, Feat., Cd, Usa, Nyc, etc.This exclusion should be even when these words are the first and last word of the TITLE, as explained in the related question.

例如，使用这样的数组:"McCartney feat.vs. CD USA NYC" ，代码必须将其转换为:

For example, with an array like this: "McCartney feat. vs. CD USA NYC" the code must convert this:

FILE "Two The Beatles Songs.wav" WAVE
  TRACK 01 AUDIO
    TITLE "dig A pony, Feat. paul mccartney"
    PERFORMER "The Beatles"
    INDEX 01 00:00:00
  TRACK 02 AUDIO
    TITLE "From Me to You"
    PERFORMER "The Beatles"
    INDEX 01 03:58:02

对此:

FILE "Two The Beatles Songs.wav" WAVE
  TRACK 01 AUDIO
    TITLE "Dig a Pony, feat. Paul McCartney"
    PERFORMER "The Beatles"
    INDEX 01 00:00:00
  TRACK 02 AUDIO
    TITLE "From Me to You"
    PERFORMER "The Beatles"
    INDEX 01 03:58:02

而不是这样做:

FILE "Two The Beatles Songs.wav" WAVE
  TRACK 01 AUDIO
    TITLE "Dig a Pony, Feat. Paul Mccartney"
    PERFORMER "The Beatles"
    INDEX 01 00:00:00
  TRACK 02 AUDIO
    TITLE "From Me to You"
    PERFORMER "The Beatles"
    INDEX 01 03:58:02

谢谢.

推荐答案

OP告知可能存在诸如"a" 之类的词因此，请立即处理以下情况.

OP told there could be words like "a" too so handle that case adding following now.

awk '
BEGIN{
  s1="\""
  num=split("McCartney feat. vs. CD USA NYC",array," ")
  for(k=1;k<=num;k++){
     temp=tolower(array[k])
     ignoreLetters[temp]=array[k]
  }
  num=split("a the to at in on with and but or",array," ")
  for(i=1;i<=num;i++){
    smallLetters[array[i]]=array[i]
  }
}
/TITLE/{
  for(i=2;i<=NF;i++){
    front=end=nothing=both=""
    if($i~/^"/ && $i!~/"$/){
      temp=tolower(substr($i,2))
      front=1
    }
    else if($i ~ /^".*"$/){
      temp=tolower(substr($i,2,length($i)-2))
      both=1
    }
    else if($i ~/"$/ && $i!~/^"/){
      temp=tolower(substr($i,1,length($i)-1))
      end=1
    }
    else{
      temp=tolower($i)
      nothing=1
    }
    if(temp in ignoreLetters){
      if(front){
         $i=s1 ignoreLetters[temp]
      }
      else if(end){
         $i=ignoreLetters[temp] s1
      }
      else if(both){
         $i=s1 ignoreLetters[temp] s1
      }
      else if(nothing){
         $i=ignoreLetters[temp]
      }
    }
    else if(temp in smallLetters){
      if(front){
         $i=s1 smallLetters[temp]
      }
      else if(end){
         $i=smallLetters[temp] s1
      }
      else if(nothing){
         $i=smallLetters[temp]
      }
      else if(both){
         $i=s1 smallLetters[temp] s1
      }
    }
    else{
      if($i~/^\"/){
        $i=substr($i,1,1) toupper(substr($i,2,1)) substr($i,3)
      }
      else{
        $i=toupper(substr($i,1,1)) substr($i,2)
      }
    }
  }
}
1
'  Input_file

请您试一下.

Could you please try following.

awk '
BEGIN{
  s1="\""
  num=split("McCartney feat. vs. CD USA NYC",array," ")
  for(k=1;k<=num;k++){
     temp=tolower(array[k])
     ignoreLetters[temp]=array[k]
  }
  num=split("a the to at in on with and but or",array," ")
  for(i=1;i<=num;i++){
    smallLetters[array[i]]=array[i]
  }
}
/TITLE/{
  for(i=2;i<=NF;i++){
    front=end=nothing=""
    if($i~/^"/){
      temp=tolower(substr($i,2))
      front=1
    }
    else if($i ~/"$/){
      temp=tolower(substr($i,1,length($i)-1))
      end=1
    }
    else{
      temp=tolower($i)
      nothing=1
    }
    if(temp in ignoreLetters){
      if(front){
         $i=s1 ignoreLetters[temp]
      }
      else if(end){
         $i=ignoreLetters[temp] s1
      }
      else if(nothing){
         $i=ignoreLetters[temp]
      }
    }
    else if(tolower($i) in smallLetters){
      $i=tolower(substr($i,1,1)) substr($i,2)
    }
    else{
      if($i~/^\"/){
        $i=substr($i,1,1) toupper(substr($i,2,1)) substr($i,3)
      }
      else{
        $i=toupper(substr($i,1,1)) substr($i,2)
      }
    }
  }
}
1
'  Input_file

输出如下:

FILE "Two The Beatles Songs.wav" WAVE
  TRACK 01 AUDIO
TITLE "Dig a Pony, feat. Paul McCartney"
    PERFORMER "The Beatles"
    INDEX 01 00:00:00
  TRACK 02 AUDIO
TITLE "From Me to You"
    PERFORMER "The Beatles"
    INDEX 01 03:58:02

代码处理的是什么:

What does code take care of:

它负责将提到的单词变成小写字母.
OP会按照其样式来制作一些字母.
它采用了不属于上述类别的其余字段，并将其第一个字母作为大写字母.
代码还会处理以"开头或以" 结尾的单词，它将首先删除它们以检查它们是否存在于用户提到的数组中，并且以后按其位置添加它们.

It takes care of making mentioned words into small letters.
It takes care of making some letters as per their style, mentioned by OP in question.
It takes of rest of fields which DO NOT fall in any of above category and makes their first letter as capital letter.
Code also takes care of words starting with " OR ending with " too, it will first remove them to check if they are present into user mentioned array or not and later add them as per their position.

这篇关于将排除数组添加到现有的awk代码的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！