问题描述
我是Talend的新手,他试图将一个简单的过程从现有ETL迁移到Talend ETL.流程本身就是
I'm new to Talend and trying to migrate a simple process from existing ETL into Talend ETL. The process itself is
输入文件-> tMap(很少的字符串操作和查找)->写入输出
Input file-->tMap (few string manipulation and lookup)-->write output
查找文件具有3列(长,1个字符字符串,2个字符字符串).长值是关键.输入和查找文件的大小(每个文件约10GB).服务器规格是运行Linux的16核(2.9GHz)64GB RAM 8GB交换空间.
Lookup file has 3 columns (long, 1 char string, 2 char string). Long value is the key. Size of input and lookup file (each around 10GB). Server spec is 16 core (2.9GHz) 64GB RAM 8GB swap running linux.
我以30g,45g,50g的Xmx/Xms值执行了作业,但是每次失败都达到了GC开销限制或堆空间不足.尝试将存储临时数据"设置为"true",然后将tMap中的缓冲区大小值更改为更大的数字.那也没有帮助.
I executed the job with Xmx/Xms values of 30g,45g,50g but each time failed with either GC overhead limit reached or Out of heap space. Tried using "Store temp data" to "true" and changing values of buffer size in tMap to a bigger number. That didn't help either.
有人在塔伦德(Talend)进行大型查找时遇到这样的问题吗?
Anyone faced such issues with large size lookups in Talend?
谢谢
推荐答案
就像@Th_talend说尝试过滤列.
Like @Th_talend say try to filter column.
您也可以尝试此操作(如果可以的话):
You can try this too (if you can):
- 将文件存储在临时表中,并在一个输入中直接使用SQL进行联接之后,才能使用SGBD而不是talend(tmap).
这篇关于Talend 10 GB输入和查找内存不足错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!