本文介绍了Oracle SQL-Loader 有效地处理值中的内部双引号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了一些 Oracle SQL Loader 挑战,正在寻找一种高效且简单的解决方案.我要加载的源文件是用管道 | 分隔的,其中值用双引号 " 括起来.问题似乎是某些值包含内部双引号.

I have some Oracle SQL Loader challenges and looking for an efficient and simple solution.my source files to be loaded are pipe | delimited, where values are enclosed by Double Quotes ".the problem seems to be that some of the values contains internal Double Quotes.

例如:..."|"a":"b"|"...

这导致我的记录被拒绝,理由是:

this causes my records to be rejected under the excuse of:

no terminator found after TERMINATED and ENCLOSED field

网络上有各种解决方案,但似乎都不适合:

there are various solutions over the web but non seems to fit:

[1]我试图在引用引号时替换所有内部双引号,但似乎在控制文件上的太多字段上应用此功能时(我有大约 2000 多个字段并使用 FILLER 仅加载一个子集)加载程序再次抱怨:

[1]I have tried to replace all internal double quotes in quoting the quotes, but it seems that when applying this function on too many fields on the control files(I have ~2000+ fields and using FILLER to load only a subset) the loader complains again:

SQL*Loader-350: Syntax error at line 7.
Expecting "," or ")", found ",".
field1  char(36) "replace(:field1,'"','""')",

(我不知道为什么,但是当在一个狭窄的列子集上应用这个解决方案时,它似乎有效)

(I do not know why but when applying this solution on a narrow subset of columns it does seem to work)

事情是潜在的所有字段都可能包含内部双引号.

thing is that potentially all fields may include internal double quotes.

[2]当省略全局 可选地用 '"' 括起来时,我能够加载所有数据,但随后所有括起来的引号都成为目标表中数据的一部分.

[2]I'm able to load all data when omitting the global optionally enclosed by '"', but then all enclosing quotes becomes part of the data in the target table.

[3]我可以省略全局 可选地包含在 '"' 语句中,并将其仅放置在选定的字段中,虽然尝试在余数上使用 "replace(:field1,'"','""')" 语句,但这很难实现,因为我无法提前知道包含内部双引号的可疑字段是什么.

[3]I can omit the global optionally enclosed by '"' statement and place it only at selected fields,while try to "replace(:field1,'"','""')" statement on the remainder, but this is difficult to implement,as I cannot know ahead what are the suspected fields to include internal double quotes.

这是我的问题:

  1. 有没有简单的方法可以说服加载器小心处理内部双引号(当值被它们括起来时)?

  1. is there no simple way to convince the loader to handle with care internal double quotes (when values are enclosed by them)?

如果我被迫临时修复数据,是否有一个单行 Linux 命令仅将内部双引号转换为另一个字符串/字符,比如说,单引号?

if I'm forced to fix the data ad-hock, is there a one liner Linux command to convert only internal double quotes to another string/char, say, single quotes?

如果我被迫将带引号的数据加载到目标表,是否有一种简单的方法可以从所有字段中删除封闭的双引号,一次全部(该表有 ~1000 列).对于非常大的表,该解决方案的实际性能是否明智?

if I'm forced to load data with the quotes to the target table, is there a simple way to remove the enclosing double quotes from all fields,all at once (the table has ~1000 columns). is the solution practical performance wise to very large tables?

推荐答案

如果在封闭的字段中从未有管道,则可以从控制文件中进行.如果您可以在一个字段中同时使用管道和双引号,那么不幸的是,我认为您别无选择,只能对文件进行预处理.

If you never had pipes in the enclosed fields you could do it from the control file. If you can have both pipes and double-quotes within a field then I think you have no choice but to preprocess the files, unfortunately.

您的解决方案 [1],替换双引号 使用 SQL 运算符,发生得太晚而无用;SQL*Loader 在执行 SQL 步骤之前已经解释了分隔符和附件.您的解决方案 [2],忽略外壳,将与 [1] 结合使用 - 直到其中一个字段确实包含管道字符.解决方案 [3] 与全局使用 [1] 和/或 [2] 存在相同的问题.

Your solution [1], to replace double-quotes with an SQL operator, is happening too late to be useful; the delimiters and enclosures have already been interpreted by SQL*Loader before it does the SQL step. Your solution [2], to ignore the enclosure, would work in combination with [1] - until one of the fields did contain a pipe character. And solution [3] has the same problems as using [1] and/or [2] globally.

有关指定分隔符的文档提到:

有时作为分隔符的标点符号也必须包含在数据中.为了实现这一点,两个相邻的分隔符被解释为该字符的一次出现,并且该字符包含在数据中.

换句话说,如果您在字段内重复使用双引号,那么它们将被转义并出现在表数据中.由于您无法控制数据生成,您可以预处理您得到的文件,用转义双引号替换所有双引号.除非您不想替换所有 - 那些实际上是真正的围栏的人不应该被转义.

In other words, if you repeated the double-quotes inside the fields then they would be escaped and would appear in the table data. As you can't control the data generation, you could preprocess the files you get to replace all the double-quotes with escaped double quotes. Except you don't want to replace all of them - the ones that are actually real enclosures should not be escaped.

您可以使用正则表达式来定位相关字符会跳过其他字符.不是我的强项,但我认为您可以使用 前瞻和后视断言来做到这一点.

You could use a regular expression to target the relevant characters will skipping others. Not my strong area, but I think you can do this with lookahead and lookbehind assertions.

如果您有一个名为 orig.txt 的文件,其中包含:

If you had a file called orig.txt containing:

"1"|A|"B"|"C|D"
"2"|A|"B"|"C"D"
3|A|""B""|"C|D"
4|A|"B"|"C"D|E"F"G|H""

你可以这样做:

perl -pe 's/(?<!^)(?<!\|)"(?!\|)(?!$)/""/g' orig.txt > new.txt

查找前面没有行起始锚或管道字符的双引号;并且后面没有管道字符或行尾锚点;并仅替换那些带有转义(双引号)双引号的内容.这将使 new.txt 包含:

That looks for a double-quote which is not preceded by the line-start anchor or a pipe character; and is not followed by a pipe character or line end anchor; and replaces only those with escaped (doubled) double-quotes. Which would make new.txt contain:

"1"|A|"B"|"C|D"
"2"|A|"B"|"C""D"
3|A|"""B"""|"C|D"
4|A|"B"|"C""D|E""F""G|H"""

字段开头和结尾的双引号没有被修改,但中间的双引号现在被转义了.如果您随后使用带有双引号括起来的控制文件加载它:

The double-quotes at the start and end of fields are not modified, but those in the middle are now escaped. If you then loaded that with a control file with double-quote enclosures:

load data
truncate
into table t42
fields terminated by '|' optionally enclosed by '"'
(
  col1,
  col2,
  col3,
  col4
)

然后你会得到:

select * from t42 order by col1;

      COL1 COL2       COL3       COL4                
---------- ---------- ---------- --------------------
         1 A          B          C|D                 
         2 A          B          C"D                 
         3 A          "B"        C|D                 
         3 A          B          C"D|E"F"G|H"        

希望与您的原始数据匹配.可能存在不起作用的边缘情况(例如双引号后跟字段内的管道)但是尝试解释其他人的数据时可以做的事情是有限的......当然,也可能有(很多)更好的正则表达式模式.

which hopefully matches your original data. There may be edge cases that don't work (like a double-quote followed by a pipe within a field) but there's a limit to what you can do to attempt to interpret someone else's data... There may also be (much) better regular expression patterns, of course.

您也可以考虑使用外部表 而不是 SQL*Loader,如果数据文件位于(或可以)在 Oracle 目录中并且您具有正确的权限.您仍然需要修改文件,但您可以使用 preprocessor 指令,而不需要在调用 SQL*Loader 之前显式执行此操作.

You could also consider using an external table instead of SQL*Loader, if the data file is (or can be) in an Oracle directory and you have the right permissions. You still have to modify the file, but you could do it automatically with the preprocessor directive, rather than needing to do that explicitly before calling SQL*Loader.

这篇关于Oracle SQL-Loader 有效地处理值中的内部双引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-11 19:51