本文介绍了Google BigQuery / Amazon Redshift使用基于列的关系数据库还是NoSQL数据库?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我仍然不清楚基于列的关系数据库与基于列的NoSQL数据库之间的区别。

Google BigQuery启用类似SQL的查询那么它怎么能成为NoSQL呢?



我知道的基于列的关系数据库是InfoBright,Vertica和Sybase IQ。



我知道基于列的NoSQL数据库是Cassandra和HBase。

以下关于Redshift的文章以NoSQL开头,但以PostgreSQL(关系型)被使用:

解决方案

div>

这里有几点需要澄清一下Google BigQuery。


$ b BigQuery是一个( (Update: There is now DML language construct to do some update/delete ops). Instead you need to append a new record and your queries must be written in a way that always work with the last version of your data. If your system is event driven, than this is very simple because each event will be appended in the BQ. But if the user updates it's profile, you need to store the profile again, you cannot update old row. You need to have a column version/date that tells you which is the most recent version, and your queries will be written first to obtain the most recent version of your rows then deal with the logic.

You can use something like over/partition by that field and use the most recent value seqnum=1.

This returns from profile, the last email for each user_id defined by the most recent entry by timestamp column.

SELECT email
   FROM
     (SELECT email
             row_number() over (partition BY user_id
                                ORDER BY TIMESTAMP DESC) seqnum
      FROM [profile]
    )
   WHERE seqnum=1

这篇关于Google BigQuery / Amazon Redshift使用基于列的关系数据库还是NoSQL数据库?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-12 05:46
查看更多