问题描述
我仍然不太清楚基于列的关系数据库与基于列的 NoSQL 数据库之间的区别.
I'm still not very clear about the difference between a column-based relational database vs. column-based NoSQL database.
Google BigQuery 支持类似 SQL 的查询,那么它怎么可能是 NoSQL?
Google BigQuery enables SQL-like query so how can it be NoSQL?
我所知道的基于列的关系数据库是 InfoBright、Vertica 和 Sybase IQ.
Column-based relational database I know of are InfoBright, Vertica and Sybase IQ.
我所知道的基于列的 NoSQL 数据库是 Cassandra 和 HBase.
Column-based NoSQL database I know of are Cassandra and HBase.
以下有关 Redshift 的文章以NoSQL"开头,但以使用 PostgreSQL(关系型)结束:http://nosqlguide.com/column-store/intro-to-amazon-redshift-a-columnar-nosql-database/
The following article about Redshift starts with saying "NoSQL" but ends with PostgreSQL (which is relational) being used:http://nosqlguide.com/column-store/intro-to-amazon-redshift-a-columnar-nosql-database/
推荐答案
这里有几件事情需要澄清,主要是关于 Google BigQuery.
A few things to clarify here mostly about Google BigQuery.
BigQuery 是一个( (Update: There is now DML language construct to do some update/delete ops). Instead you need to append a new record and your queries must be written in a way that always work with the last version of your data. If your system is event driven, than this is very simple because each event will be appended in the BQ. But if the user updates it's profile, you need to store the profile again, you cannot update old row. You need to have a column version/date that tells you which is the most recent version, and your queries will be written first to obtain the most recent version of your rows then deal with the logic.
您可以使用该字段的 over/partition 之类的内容,并使用最新的值 seqnum=1
.
You can use something like over/partition by that field and use the most recent value seqnum=1
.
这从 profile
返回,每个 user_id
的最后一个 email
由 timestamp
定义的最新条目列.
This returns from profile
, the last email
for each user_id
defined by the most recent entry by timestamp
column.
SELECT email
FROM
(SELECT email
row_number() over (partition BY user_id
ORDER BY TIMESTAMP DESC) seqnum
FROM [profile]
)
WHERE seqnum=1
这篇关于Google BigQuery/Amazon Redshift 使用基于列的关系数据库还是 NoSQL 数据库?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!