我正在用apache spark和apache cassandra进行数据分析,并且正在努力将timeuuid字段插入cassandra中。
我有下表
CREATE TABLE leech_seed_report.daily_sessions (
id timeuuid PRIMARY KEY,
app int,
count int,
date bigint,
offline boolean,
vendor text,
version text
) WITH bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
CREATE INDEX daily_sessions_app_idx ON leech_seed_report.daily_sessions (app);
CREATE INDEX daily_sessions_date_idx ON leech_seed_report.daily_sessions (date);
CREATE INDEX daily_sessions_offline_idx ON leech_seed_report.daily_sessions (offline);
CREATE INDEX daily_sessions_vendor_idx ON leech_seed_report.daily_sessions (vendor);
CREATE INDEX daily_sessions_version_idx ON leech_seed_report.daily_sessions (version);
我正在使用插入行
rows.saveToCassandra("leech_seed_report", "daily_sessions", SomeColumns("id", "date", "app", "vendor", "version", "offline", "count"))
我的行由格式的元组组成
([timmuuid_will_be_here], BigInt, Int, String, String, Boolean, Int)
我玩过插入到没有timeuuid字段的同一张表中,并且一切正常,但是我无法终生解决如何为每一行创建一个timeuuid
任何帮助将不胜感激,我是火花,卡桑德拉和斯卡拉的新手,感觉就像我的头撞在砖墙上
谢谢
马特
最佳答案
最后,我尝试按照UUIDGen的建议使用zero323,但是我收到了一个错误,我认为是由于缺少依赖项,但是我不确定是一个scala新手。研究了一点之后,这看起来应该是我应该采取的方式,但是当我有更多的时间/经验时,就会病倒了。
我使用gfc-timeuuid使Spark工作正常工作并生成timeuuid,就像将以下内容添加到build.sbt文件中一样简单
libraryDependencies += "com.gilt" %% "gfc-timeuuid" % "0.0.5"
然后在我的Scala脚本中执行以下操作
import com.gilt.timeuuid._
val tuuid = TimeUuid()