问题描述
作为,这是一个顶尖的问题,以巩固我对,并得到我的数据建模决策的一些批评。我将修改由@Jimmy Kane创建的Jukebox示例,以更好地反映我的真实世界的情况。
As a continuation of in this post, this is a bit of a capstone-style question to solidify my understanding of gae-datastore and get some critiques on my data modeling decisions. I'll be modifying he Jukebox example created by @Jimmy Kane to better reflect my real world case.
在原始设置中,
想像一下,你有一个自动点播机,每个房间都有队列。人们正在每个点唱机的每个队列排队歌曲。
imagine that you have a jukebox with queues per room let's say. And people are queueing songs to each queue of each jukebox.
J=Jukebox, Q=queue, S=Song
Jukebox
/ | \
Q1 Q2 Q3
/ | \ | \
S1 S2 S3 S4 S5
首先,填写歌曲模型:
First, fill out the Song model as such:
Song(ndb.Model):
user_key = ndb.KeyProperty()
status = ndb.StringProperty()
datetime_added = ndb.DateTimeProperty()
我的修改是将可以将CUD歌曲添加到任何队列的用户
。在前端,用户将访问一个UI,以查看他们在每个队列中的歌曲,并进行更改。在后端,应用程序需要知道每个队列中的哪些歌曲,每个队列中播放正确的歌曲,并在播放后从队列中删除歌曲。
My modification is to add a User
that can CUD songs to any queue. In the frontend, users will visit a UI to see their songs in each of the queues, and make changes. In the backend, the application needs to know which songs are in each queue, play the right song off each queue and remove songs from queues once played.
为了用户能够在队列中看到其歌曲,我假定每个用户将是一个根实体,并且需要存储一个歌曲键列表
In order for a User to be able to see its songs in queue I'm presuming each User would be a root entity and would need to store a list of Song keys
User(ndb.Model):
song_keys = ndb.KeyProperty(kind='Song', repeated=True)
然后,为了检索用户的歌曲,应用程序将(假定user_id已知)
Then, to retrieve the user's songs, the application would (presuming user_id is known)
user = User.get_by_id(user_id)
songs = ndb.get_multi(user.song_keys)
而且,由于获得
是非常一致的,所以用户总是会看到非陈旧的数据
And, since get
s are strongly consistent, the user would always see non-stale data
然后当队列1完成播放歌曲时,应用程序可以执行以下操作:
Then, when queue 1 is finished playing a song, the application could do something like:
current_song.status = "inactive"
current_song.put()
query=Song.query(ancestor=ndb.Key('Jukebox', '1', 'Queue', '1')).filter(Song.status=="active").order(Song.datetime_added)
next_song = query.get()
我正好在认为祖先查询确保一致地表示上述停用目前的歌曲以及来自用户的任何CUD?
Am I right in thinking that the ancestor query ensures consistent representation of the preceding deactivation of the current song as well as any CUD from the Users?
最后一步是更新用户在事务中的song_keys列表
The final step would be to update the User's song_keys list in a transaction
user = current_song.user_key.get()
user.song_keys.remove(current_song.key)
user.put()
摘要和一些利弊
- 如果我的理解是对的,一致性似乎在rightbaces
中做正确的事情? - 我应该关注争用实体组
Jukebox
- 我不会指望它是一个高吞吐量类型的用例,但我的现实生活场景需要扩大与用户数量,可能类似的数量
队列
s,因为用户
s,也许是2x - 5x以上用户
s比queue
s。如果整个组限制为1个写入/秒,并且许多用户以及每个队列可能会创建和更新歌曲,这可能是一个瓶颈。 - 一个解决方案可能是消除使用
Jukebox
根实体,并且每个队列
成为其自己的根实体
- 我不会指望它是一个高吞吐量类型的用例,但我的现实生活场景需要扩大与用户数量,可能类似的数量
-
User.song_keys
可能是long-ish,说100song.key
秒。 建议避免在ListProperty中存储过多的密钥列表。这里有什么问题?这是一个数据库概念,并且使用ndb处理列表的方式,使用repeated = True
属性选项?
- The consistency seems to be doing the right things in the rightplacesif my understanding is right?
- Should I be concerned about contention on the
Jukebox
entity group?- I wouldn't expect it to be a high throughput type of use case but my real-life scenario needs to scale with the number of users and there are probably a similar number of
queue
s as there areuser
s, maybe 2x - 5x moreuser
s thanqueue
s. If the whole group is limited to 1 write / sec and lots of users as well as each queue could be creating and updating songs, this could be a bottleneck - One solution could be to do away with the
Jukebox
root entity and have eachQueue
be its own root entity
对这种方式的意见或对事情的批评我从根本上误解了?
Opinions on this approach or critiques on things I'm fundamentally misunderstanding?
- 大概我也可以选择,类型的对称翻转
数据模型,并有实体组,看起来像用户
- >
歌曲
并在队列
模型中存储song_keys
列表
- Presumably, I could also alternatively, kind of just symmetrically flipthe data models and have entity groups that look like
User
->Song
and storesong_keys
lists in theQueue
model
推荐答案
我想你应该重新考虑你的用例对于一致性很重要。从我可以看到,所有这些实体都具有很强的一致性并不重要。在我看来,最终的一致性会很好。大多数时候你会看到最新的数据,只有时候(读:真的很少),你会看到一些陈旧的数据。想想你总是掌握最新数据至关重要,而且还会对您的应用程序造成多大的损失。需要强一致性的实体在每秒读取次数方面不以最有效的方式存储。
I think you should reconsider how important is strong consistency for your use case. From what I can see it is not critical that all this entities have strong consistency. In my opinion, eventual consistency will work just fine. Most of the time you will see up to date data and only sometimes (read: really really rarely) you will see some stale data. Think about how critical is that you always get up to date data vs how much it penalizes your application. Entities that need strong consistency are not stored in the most efficient way in terms of number of reads per second.
另外,如果您查看文档,您将看到它提到使用该方法时每秒钟写入速度不能超过1次。
Also if you look at the document Structuring Data for Strong Consistency, you will see that it mentions that you can't have more then 1 write per second when using that approach.
还有实体组根据影响数据的位置。
Also having entity groups effects data locality as per AppEngine Model Class docs.
如果您还在,第2部分将会看到他们如何处理具有相同父密钥的实体。从本质上讲,它们靠近在一起。我认为Google可能与AppEngine Datastore使用类似的方法。在某些时候,根据源码,Google可能会将Spanner用于AppEngine数据存储在未来。
If you also read the famous Google's doc on Google Spanner, section 2 you will see how they deal with entities which have same parent key. Essentially, they are put closer together. I assume Google might be using similar approach with AppEngine Datastore. At some point, according to this source Google might use Spanner for AppEngine Datastore in the future.
另外一点,没有更便宜的快速获得按键。话虽如此,如果您能以某种方式避免查询,可能会降低运行应用程序的成本。假设您正在开发一个Web应用程序,您可以将歌曲密钥存储在JSON /文本对象中,然后使用Prospective Search API获取最新的结果。这种方法需要更多的工作,并要求您采用最终的一致性模型,因为数据在到达客户端时可能会稍微过时。根据您的使用情况(这并不适用于小型应用程序和小型用户群),节省成本可能会降低成本。当我说成本时,我的意思是数据可能稍微过时。
Another point, there is no cheaper of faster get then get by key. Having said this, if you can somehow avoid querying this could reduct the cost of running you application. Assuming that you're developing a web application you can store your song keys in a JSON/text object and then use Prospective Search API to get up to date results. This approach requires a bit more work and requires you to embrace eventual consistency model as the data might be slightly out of date by the time it reaches the client. Depending on your use case (this does not apply a small application and small user base obviously) the savings might out-weight the cost. When I say the cost I mean the fact that data might be slightly out of date.
根据我的经验,强大的一致性不是大量应用程序的要求。可能存在稍微过时数据的应用程序的数量似乎超过了不能使用的应用程序。以YouTube为例,如果我没有立即看到所有的视频,我真的不在乎(因为有这么大的数字,我甚至不知道我是否看到所有的视频)。当你设计这样的东西时,首先问自己的问题,真的有必要提供最新的数据吗?用户甚至可以分辨出差异吗?最新的数据要贵得多,然后有点陈旧。
In my experience, strong consistency is not a requirement for a large number of applications. The number of applications that can live with slightly stale data seems to outnumber the applications that cannot. Take YouTube for example, I don't really mind if I don't see all the videos immediately (as there's such a large number that I can't even know if I see all of them or not). When you design something like this, first ask yourself question, is it really necessary to provide up to date data or a bit stale data is good enough? Can the user even tell the difference? Up to date data is much more expensive then a little bit stale.
这篇关于GAE实体组/数据建模一致性和性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
- I wouldn't expect it to be a high throughput type of use case but my real-life scenario needs to scale with the number of users and there are probably a similar number of