在pentaho中,当我运行一个大约50000行的cassandra输入步骤时,会出现以下异常:
有没有办法在pentaho中控制查询结果的大小?或者有没有一种方法可以流式处理查询结果,而不是批量获取所有结果?

2014/10/09 15:14:09 - Cassandra Input.0 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Unexpected error
2014/10/09 15:14:09 - Cassandra Input.0 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : org.pentaho.di.core.exception.KettleException:
2014/10/09 15:14:09 - Cassandra Input.0 - Frame size (17727647) larger than max length (16384000)!
2014/10/09 15:14:09 - Cassandra Input.0 - Frame size (17727647) larger than max length (16384000)!
2014/10/09 15:14:09 - Cassandra Input.0 -
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.pentaho.di.trans.steps.cassandrainput.CassandraInput.initQuery(CassandraInput.java:355)
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.pentaho.di.trans.steps.cassandrainput.CassandraInput.processRow(CassandraInput.java:234)
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2014/10/09 15:14:09 - Cassandra Input.0 -   at java.lang.Thread.run(Unknown Source)
2014/10/09 15:14:09 - Cassandra Input.0 - Caused by: org.apache.thrift.transport.TTransportException: Frame size (17727647) larger than max length (16384000)!
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:137)
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:362)
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:284)
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:191)
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql_query(Cassandra.java:1656)
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.apache.cassandra.thrift.Cassandra$Client.execute_cql_query(Cassandra.java:1642)
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.pentaho.cassandra.legacy.LegacyCQLRowHandler.newRowQuery(LegacyCQLRowHandler.java:289)
2014/10/09 15:14:09 - Cassandra Input.0 -   at org.pentaho.di.trans.steps.cassandrainput.CassandraInput.initQuery(CassandraInput.java:333)
2014/10/09 15:14:09 - Cassandra Input.0 -   ... 3 more
2014/10/09 15:14:09 - Cassandra Input.0 - Finished processing (I=0, O=0, R=0, W=0, U=0, E=1)
2014/10/09 15:14:09 - all customer data - Transformation detected one or more steps with errors.
2014/10/09 15:14:09 - all customer data - Transformation is killing the other steps!

最佳答案

org.apache.thrift.transport.TTransportException:
  Frame size (17727647) larger than max length (16384000)!

为了避免性能下降,会对帧(节省消息)的大小进行限制。你可以通过修改一些设置来调整它。这里需要注意的重要一点是,您需要设置bot客户端大小和服务器端的设置。
服务器端输入cassandra.yaml
# Frame size for thrift (maximum field length).
# default is 15mb, you'll have to increase this to at-least 18.
thrift_framed_transport_size_in_mb: 18

# The max length of a thrift message, including all fields and
# internal thrift overhead.
# default is 16, try to keep it to thrift_framed_transport_size_in_mb + 1
thrift_max_message_length_in_mb: 19

设置客户端限制取决于您使用的驱动程序。

关于database - Pentaho框架尺寸(17727647)比最大长度(16384000)大!,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/26286359/

10-12 18:54