问题描述
以下是我的Cassandra架构,使用
Here is my Cassandra schema, using Datastax Enterprise
CREATE KEYSPACE applications
WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 1};
USE applications;
CREATE TABLE events(
bucket text,
id timeuuid,
app_id uuid,
event text,
PRIMARY KEY(bucket, id)
);
我想通过app_id(TimeUUID)和id(UUID)在PIG中FILTER,这里是我的Pig脚本。
I want to FILTER in PIG by app_id (TimeUUID) and id (UUID), here is my Pig script.
events = LOAD 'cql://applications/events'
USING CqlStorage()
AS (bucket: chararray, id: chararray, app_id: chararray, event: chararray);
result = FOREACH events GENERATE bucket, id, app_id;
DESCRIBE result;
DUMP result;
以下是结果
result: {bucket: chararray,id: chararray,app_id: chararray}
(2014-02-28-04,?O]??4??p??M?,;??F? (|?Mb) \n
(2014-02-28-04,?O??4??p??M?,?h^@?E????)
(2014-02-28-04,?V???4??p??M?,;??F? (|?Mb)
(2014-02-28-04,?W?0?4??p??M?,?h^@?E????)
(2014-02-28-04,?X^p?4??p??M?,?h^@?E????)
注意,app_id和id字段二进制和我需要过滤一些UUID,任何建议?
Notice, the app_id, and id fields are binary and I need to filter by some UUID, any suggestions?
推荐答案
您需要使用UDF转换二进制字节的UUID / TimeUUID to chararray。不要尝试直接定义为chararray as as(bucket:chararray,id:chararray,app_id:chararray,event:chararray);
You need use a UDF to convert the binary bytes of UUID/TimeUUID to chararray. Don't try to define it as chararray directly like AS (bucket: chararray, id: chararray, app_id: chararray, event: chararray);
或者您可以使用将UUID / TimeUUID转换为字符串
Or you can use https://github.com/cevaris/pig-dse/blob/master/src/main/java/com/dse/pig/udfs/AbstractCassandraStorage.java which convert UUID/TimeUUID to String
如果您认为将UUID转换为字符串作为默认值,则可以将Cassandra票证归档。
File a Cassandra ticket if you think it's good to convert UUID to string as default.
这篇关于如何在Pig中筛选Cassandra TimeUUID / UUID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!