问题描述
我正在尝试将一些列表整理到格式正确的CSV文件中,以进行数据库导入.
I am trying to scrub some lists into a properly formatted CSV file for database import.
我的起始文件看起来像这样,应该是每个行"跨越如下多行
My starting file, looks something like this with what is supposed to be each "line" spanning multiple lines like below
Mr. John Doe
Exclusively Stuff, 186
Caravelle Drive, Ponte Vedra
33487.
我创建了一个sed
脚本来清理文件(有很多脏"格式,例如双空格和逗号前后的空格). 问题是带有句点的邮编.我想将该句点更改为新行,但我无法使其正常工作.
I created a sed
script that cleans up the the file (there's lots of "dirty" formatting like double spaces and spaces before/after commas). The problem is the Zip with the period. I would like to change that period for a new line, but I cannot get it to work.
我使用的命令是:
sed -E -f scrub.sed test.txt
和scrub.sed
脚本如下:
:a
N
s|[[:space:]][[:space:]]| |g
s|,[[:space:]]|,|g
s|[[:space:]],|,|g
s|\n| |g
s|[[:space:]]([0-9]{5})\.|,FL,\1\n |g
$!ba
我得到的是
Mr. John Doe,Exclusively Stuff,186 Caravelle Drive,Ponte Vedra,FL,33487n
如果认为Zip +.(句点)将是一个很好的定界符",可以使用替换功能,虽然我可以找到它,但似乎无法告诉它在其中添加换行符.
If figured that the Zip+.(period) would be a great "delimiter" to use the substitution on and while I can find it, I can't seem to tell it to put a newline there.
我在网上找到的大多数东西都是关于用其他东西替换换行符(通常将它们删除),但是关于换行符替换的内容不多.我确实找到了它,但是没有用:如何用sed在`),(``中的逗号后插入换行符?
Most of the things I found online are about replacing the newline with something else (usually deleting them), but not much on replacing with a newline. I did find this, but it didn't work: How to insert newline character after comma in `),(` with sed?
有什么我想念的吗?
更新:
我编辑了scrub.sed文件,将原样的新行插入到了示例中.还是不行
I edited my scrub.sed file putting the literal new line as instucted. It still doesn't work
:a
N
s|[[:space:]][[:space:]]| |g
s|,[[:space:]]|,|g
s|[[:space:]],|,|g
s|\n| |g
s|[[:space:]]([0-9]{5})\.|,FL,\1\
|g
$!ba
我得到的是(一行中的所有内容):
What I get is (everything on one line):
Mr. John Doe,Exclusively Stuff,186 Caravelle Drive,Ponte Vedra,FL,33487 Mrs. Jane Smith,Props and Stuff,123 Main Drive,Jacksonville,FL,336907
我的预期输出应该是:
Mr. John Doe,Exclusively Stuff,186 Caravelle Drive,Ponte Vedra,FL,33487
Mrs. Jane Smith,Props and Stuff,123 Main Drive,Jacksonville,FL,336907
推荐答案
BSD上的sed
不支持换行的\n
表示(将其转换为文字n
):
The sed
on BSD does not support the \n
representation of a new line (turning it into a literal n
):
$ echo "123." | sed -E 's/([[:digit:]]*)\./\1\n next line/'
123n next line
GNU sed
确实支持\n
表示形式:
GNU sed
does support the \n
representation:
$ echo "123." | gsed -E 's/([[:digit:]]*)\./\1\nnext line/'
123
next line
替代方法是:
使用单个字符定界符,然后使用tr
转换成新行:
Use a single character delimiter that you then use tr
translate into a new line:
$ echo "123." | sed -E 's/([[:digit:]]*)\./\1|next line/' | tr '|' '\n'
123
next line
或者在sed脚本中使用转义的换行符:
Or use an escaped literal new line in your sed script:
$ echo "123." | sed -E 's/([[:digit:]]*)\./\1\
next line/'
123
next line
或使用awk
:
$ echo "123." | awk '/^[[:digit:]]+\./{sub(/\./,"\nnext line")} 1'
123
next line
或使用支持\n
这篇关于使用sed插入换行符(\ n)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!