问题描述
我发现了 Talend Open Source Data Integrator,我想将我的数据文件转换为 csv 文件.
我的数据是一些像这个例子的键值数据集:
A=0 B=3 C=4A=2 C=4A=2 B=4A= B=3 C=1
我想把它转换成这样的 CSV:
A,B,C0,3,42,,42,4,
对于 Logstash,我使用的是
我没有使用 tFileInputRaw
和 tConvertType
,而是使用了 tFileInputFullRow
,它将文件一行一行地读入一个字符串.
我没有手动拆分字符串(您需要检查空值),而是将 tExtractDelimitedFields
与="一起使用.作为分隔符,以便从键=值"中提取键和值.列.
最终结果是一样的,在开头多出一列.
如果你想删除列,一个肮脏的黑客将是使用 tFileInputFullRow
读取输出文件,并使用像 ^[^;]+;
这样的正则表达式tReplace
替换直到(包括)第一个;"在带有空字符串的行中,并将结果写入另一个文件.
I'm discovering Talend Open Source Data Integrator and I would like to transform my data file into a csv file.
My data are some sets of key value data like this example:
A=0 B=3 C=4
A=2 C=4
A=2 B=4
A= B=3 C=1
I want to transform it into a CSV like this one:
A,B,C
0,3,4
2,,4
2,4,
With Logstash, I was using the "key value" filter which is able to do this job with a few lines of code. But with Talend, I don't find a similar transformation. I tried a "delimiter file" job and some other jobs without success.
Corentin's answer is excellent, but here's an enhanced version of it, which cuts down on some components:
Instead of using tFileInputRaw
and tConvertType
, I used tFileInputFullRow
, which reads the file line by line into a string.
Instead of splitting the string manually (where you need to check for nulls), I used tExtractDelimitedFields
with "=" as a separator in order to extract a key and a value from the "key=value" column.
The end result is the same, with an extra column at the beginning.
If you want to delete the column, a dirty hack would be to read the output file using a tFileInputFullRow
, and use a regex like ^[^;]+;
in a tReplace
to replace anything up to (and including) the first ";" in the line with an empty string, and write the result to another file.
这篇关于Talend:相当于logstash“键值"筛选的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!