本文介绍了Talend:相当于logstash“键值"筛选的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

限时删除!!

我发现了 Talend Open Source Data Integrator,我想将我的数据文件转换为 csv 文件.

我的数据是一些像这个例子的键值数据集:

A=0 B=3 C=4A=2 C=4A=2 B=4A= B=3 C=1

我想把它转换成这样的 CSV:

A,B,C0,3,42,,42,4,

对于 Logstash,我使用的是

我没有使用 tFileInputRawtConvertType,而是使用了 tFileInputFullRow,它将文件一行一行地读入一个字符串.
我没有手动拆分字符串(您需要检查空值),而是将 tExtractDelimitedFields 与="一起使用.作为分隔符,以便从键=值"中提取键和值.列.
最终结果是一样的,在开头多出一列.
如果你想删除列,一个肮脏的黑客将是使用 tFileInputFullRow 读取输出文件,并使用像 ^[^;]+; 这样的正则表达式tReplace 替换直到(包括)第一个;"在带有空字符串的行中,并将结果写入另一个文件.

I'm discovering Talend Open Source Data Integrator and I would like to transform my data file into a csv file.

My data are some sets of key value data like this example:

A=0 B=3 C=4
A=2 C=4
A=2 B=4
A= B=3 C=1

I want to transform it into a CSV like this one:

A,B,C
0,3,4
2,,4
2,4,

With Logstash, I was using the "key value" filter which is able to do this job with a few lines of code. But with Talend, I don't find a similar transformation. I tried a "delimiter file" job and some other jobs without success.

解决方案

Corentin's answer is excellent, but here's an enhanced version of it, which cuts down on some components:

Instead of using tFileInputRaw and tConvertType, I used tFileInputFullRow, which reads the file line by line into a string.
Instead of splitting the string manually (where you need to check for nulls), I used tExtractDelimitedFields with "=" as a separator in order to extract a key and a value from the "key=value" column.
The end result is the same, with an extra column at the beginning.
If you want to delete the column, a dirty hack would be to read the output file using a tFileInputFullRow, and use a regex like ^[^;]+; in a tReplace to replace anything up to (and including) the first ";" in the line with an empty string, and write the result to another file.

这篇关于Talend:相当于logstash“键值"筛选的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

1403页,肝出来的..

09-06 23:45