问题描述
假设由一个或多个空格分隔的字符串多行文件。进一步假设,串组可以用双引号括起来。
Assume a multi-line file with strings separated by one or more whitespaces. Assume further that groups of strings can be enclosed by double quotes.
> cat file
foo bar "foobar baz qux"
foo "bar foobar baz" qux
"foo bar foobar" baz qux # multiple whitespaces in this line
如果我想使用,以取代单一的制表符双引号外的所有空格的 AWK 的下面列出,我收到以下内容:
If I wish to replace all whitespaces outside the double quotes with single tab characters using awk as listed below, I receive the following:
awk '{OFS="\t"; FPAT="([^, ]+)|(\"[^\"]+\")"; $1=$1; print}' file
# foo bar "foobar baz qux" # In this line, strings inside the quote are separated by tabs
# foo "bar foobar baz" qux
# "foo bar foobar" baz qux
问题只似乎仅限于以双引号结束行。
The problem only seems to be restricted to the line that ends with a double quote.
的编辑1:的
为了更好地可视化的问题在眼前:
EDIT 1:To better visualize the issue at hand:
awk '{OFS="\t"; FPAT="([^, ]+)|(\"[^\"]+\")"; $1=$1; print}' file | cat -A
# foo^Ibar^I"foobar^Ibaz^Iqux"$
# foo^I"bar foobar baz"^Iqux$
# "foo bar foobar"^Ibaz^Iqux$
的编辑2:的
看来,这两个命令回答部分做工精细建议,除非非字母字符一定数量或组合在输入present。下面是一个例子:
EDIT 2:It appears that both commands suggested in the answer section work fine unless a certain number or combination of non-letter characters are present in the input. Here is an example:
> cat file
foo_bar_baz foo foo_bar . Name=foo;product="bar baz qux"
foo_bar_baz foo foo_bar . Name=foo;product="bar baz qux"
foo_bar_baz foo foo_bar . Name=foo;product="bar baz qux"
> awk -v FPAT='"[^"]*"|[^[:blank:]]+' -v OFS='\t' '{$1=$1} 1' file | cat -A
foo_bar_baz^Ifoo^Ifoo_bar^I.^IName=foo;product="bar^Ibaz^Iqux"$
foo_bar_baz^Ifoo^Ifoo_bar^I.^IName=foo;product="bar^Ibaz^Iqux"$
foo_bar_baz^Ifoo^Ifoo_bar^I.^IName=foo;product="bar^Ibaz^Iqux"$
> awk '{$1=$1}1' OFS='\t' FPAT='"[^"]+"|[^ ]+' file | cat -A
foo_bar_baz^Ifoo^Ifoo_bar^I.^IName=foo;product="bar^Ibaz^Iqux"$
foo_bar_baz^Ifoo^Ifoo_bar^I.^IName=foo;product="bar^Ibaz^Iqux"$
foo_bar_baz^Ifoo^Ifoo_bar^I.^IName=foo;product="bar^Ibaz^Iqux"$
的编辑3:的
这个问题提出的编辑2 的进一步这里讨论:Replacing空白单标签,除非在双引号 - 第二部分
EDIT 3:This question posed EDIT 2 is further discussed here: Replacing whitespace with single tab unless in double quotes - Part II
推荐答案
使用的GNU AWK
你可以做到这一点很容易:
Using gnu-awk
you can do this easily:
awk -v FPAT='"[^"]*"|[^[:blank:]]+' -v OFS='\t' '{$1=$1} 1' file
foo bar "foobar baz qux"
foo "bar foobar baz" qux
"foo bar foobar" baz qux
这篇关于除非在双引号单卡更换空白的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!