问题描述
我对 GCP 比较陌生,刚刚开始在 GCP 上设置/评估我的组织架构.
I'm relatively new to GCP and just starting to setup/evaluate my organizations architecture on GCP.
场景:
数据将流入发布/订阅主题(高频率,低数据量).目标是将该数据移动到大表中.根据我的理解,您可以使用云函数触发主题或使用 Dataflow 来做到这一点.
Scenario:
Data will flow into a pub/sub topic (high frequency, low amount of data). The goal is to move that data into Big Table. From my understanding you can do that either with a having a cloud function triggering on the topic or with Dataflow.
现在我有云功能的经验,我很满意,所以我会选择.
Now I have previous experience with cloud functions which I am satisfied with, so that would be my pick.
我没有看到选择一个而不是另一个的好处.所以我的问题是什么时候选择这些产品?
I fail to see the benefit of choosing one over the other. So my question is when to choose what of these products?
谢谢
推荐答案
两种解决方案都可以.如果您的发布/订阅流量增长到大量数据,Dataflow 会更好地扩展,但 Cloud Functions 应该可以很好地处理少量数据;我会查看此页面(尤其是速率限制部分)以确保您适合 Cloud Functions:https://cloud.google.com/functions/quotas
Both solutions could work. Dataflow will scale better if your pub/sub traffic grows to large amounts of data, but Cloud Functions should work fine for low amounts of data; I would look at this page (especially the rate-limit section) to ensure that you fit within Cloud Functions: https://cloud.google.com/functions/quotas
另一件需要考虑的事情是,Dataflow 可以保证对您的数据进行一次处理,因此 BigTable 中不会出现重复项.Cloud Functions 不会立即为您执行此操作.如果您使用函数方法,那么您将需要确保 Pub/Sub 消息始终如一地确定写入哪个 BigTable 单元格;这样,如果函数被多次重试,相同的数据将简单地覆盖同一个 BigTable 单元格.
Another thing to consider is that Dataflow can guarantee exactly-once processing of your data, so that no duplicates end up in BigTable. Cloud Functions will not do this for you out of the box. If you go with a functions approach, then you will want to make sure that the Pub/Sub message consistently determines which BigTable cell is written to; that way, if the function gets retried several times the same data will simply overwrite the same BigTable cell.
这篇关于移动数据时,Dataflow 优于云函数的优势?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!