问题描述
如何构建从非标准化的文本文件中一个规范化的表格?
感谢您的答复/时间。
我们需要建立从非标准化的文本文件中的标准化数据库表。我们探讨了几个选项,如UNIX外壳,和PostgreSQL等我找学习好想法从这个社会的决议。
We need to build a Normalized DB Table from DeNormalized text file. We explored couple of options such as unix shell , and PostgreSQL etc. I am looking learn better ideas for resolutions from this community.
输入文本文件是不同的长度与逗号分隔的记录。内容可能是这样的:
The input text file is various length with comma delimited records. The content may look like this:
XXXXXXXXXX , YYYYYYYYYY, TTTTTTTTTTT, UUUUUUUUUU, RRRRRRRRR,JJJJJJJJJ
111111111111, 22222222222, 333333333333, 44444444, 5555555, 666666
EEEEEEEE,WWWWWW,QQQQQQQ,PPPPPPPP
我们喜欢正常化如下(分体式和放大器;对):
We like to normalize as follows (Split & Pair):
XXXXXXXXXX , YYYYYYYYYY
TTTTTTTTTTT, UUUUUUUUUU
RRRRRRRRR,JJJJJJJJJ
111111111111, 22222222222
333333333333, 44444444
5555555, 666666
EEEEEEEE,WWWWWW
QQQQQQQ,PPPPPPPP
我们需要去与文本pre-过程和负载的方法?
如果是,什么是pre-过程的最佳方式?
是否有任何单一SQL /函数的方法来得到上面的?
感谢帮助。
推荐答案
使用了GNU AWK
(由于RS)
awk '{$1=$1} NR%2==1 {printf "%s,",$0} NR%2==0' RS="[,\n]" file
XXXXXXXXXX,YYYYYYYYYY
TTTTTTTTTTT,UUUUUUUUUU
RRRRRRRRR,JJJJJJJJJ
111111111111,22222222222
333333333333,44444444
5555555,666666
EEEEEEEE,WWWWWW
QQQQQQQ,PPPPPPPP
{$ 1 = $ 1}
清理和删除多余的空格结果 NR%2 == 1 {printf的%S,$ 0}
打印奇数零件| NR%2 == 0
打印,甚至一部分,新的生产线结果 RS =[,\\ n]
设置记录,
或换行
{$1=$1}
Cleans up and remove extra spacesNR%2==1 {printf "%s,",$0}
prints odd partsNR%2==0
prints even part and new lineRS="[,\n]"
sets the record to ,
or newline
这篇关于我们如何从非标准化的文本文件一建立规范化的表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!