问题描述
我有三列的文件,里面有管道作为分隔符。现在文件中的某些行可以有,下的,而不是|,由于一些错误。我想输出的所有这样的错误行。
I have a file with three columns, which has pipe as a delimiter. Now some lines in the file can have a "," instead of "|", due to some error. I want to output all such erroneous rows.
推荐答案
要计算使用awk可以使用列数NF
变量:
To count the number of columns with awk you can use the NF
variable:
$ cat file
ABC|12345|EAR
PQRST|123|TWOEYES
ssdf|fdas,sdfsf
$ awk -F\| 'NF!=3' file
ssdf|fdas,sdfsf
不过,这似乎并没有涵盖所有可能的方式的数据可能基于问题和意见的各种版本已损坏。
However, this does not seem to cover all the possible ways the data could be corrupted based on the various revisions of the question and the comments.
有一个更好的方法是定义的确切的格式,数据必须遵循。例如,假设行是正确的,如果它是三列,只有第一和第三字母,第二个数字,你可以写下面的脚本到所有不符合标准线匹配:
A better approach would be to define the exact format that the data must follow. For example, assuming that a line is "correct" if it is three columns, with the first and third letters only, and the second numeric, you could write the following script to match all non conforming lines:
awk -F\| '!(NF==3 && $1$3 ~ /^[a-zA-Z]+$/ && $2+0==$2)' file
测试(通知,只有第二行(这是符合)不打印):
Test (notice that only the second line (which is conforming) does not get printed):
$ cat file
A,BC|12345|EAR
PQRST|123|TWOEYES
ssdf|fdas,sdfsf
ABC|3983|MAKE,
sf dl lfsdklf |kldsamfklmadkfmask |mfkmadskfmdslafmka
ABC|abs|EWE
sdf|123|123
$ awk -F\| '!(NF==3&&$1$3~/^[a-zA-Z]+$/&&$2+0==$2)' file
A,BC|12345|EAR
ssdf|fdas,sdfsf
ABC|3983|MAKE,
sf dl lfsdklf |kldsamfklmadkfmask |mfkmadskfmdslafmka
ABC|abs|EWE
sdf|123|12
您可以调整上面的命令,以您的具体需求的基础上,你认为是一个有效的输入。例如,如果你想也限制每行的长度为50个字符,你可以做
You can adapt the above command to your specific needs, based on what you think is a valid input. For example, if you wanted to also restrict the length of each line to 50 characters, you could do
awk -F\| '!(NF==3 && $1$3 ~ /^[a-zA-Z]+$/ && $2+0==$2 && length($0)<50)' file
这篇关于shell脚本找到分隔符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!