问题描述
我正在使用谷歌应用引擎,并使用低级别的Java API来访问Big Table。我正在构建一个4层的SAAS应用程序:
I am working with google app engine and using the low leval java api to access Big Table. I'm building a SAAS application with 4 layers:
- 客户端浏览器
- RESTful资源层
- 业务层
- 数据访问层
- Client web browser
- RESTful resources layer
- Business layer
- Data access layer
我正在建立一个应用程序来帮助管理我的移动自动化细节公司(和其他人喜欢它)。我必须代表这四个独立的概念,但不确定我目前的计划是否是一个好的:
I'm building an application to help manage my mobile auto detailing company (and others like it). I have to represent these four separate concepts, but am unsure if my current plan is a good one:
- 约会
- 行项目
- 发票
- 付款
- Appointments
- Line Items
- Invoices
- Payments
约会:预约是员工预期为了提供服务而设的地点和时间。
Appointment: An "Appointment" is a place and time where employees are expected to be in order to deliver a service.
订单项:订单项是一项服务,费用或折扣及其相关信息。可能进入约会的订单项的示例:
Line Item: A "Line Item" is a service, fee or discount and its associated information. An example of line items that might go into an appointment:
Name: Price: Commission: Time estimate
Full Detail, Regular Size: 160 75 3.5 hours
$10 Off Full Detail Coupon: -10 0 0 hours
Premium Detail: 220 110 4.5 hours
Derived totals(not a line item): $370 $185 8.0 hours
发票:发票是客户承诺支付的一个或多个订单项的记录for。
Invoice: An "Invoice" is a record of one or more line items that a customer has committed to pay for.
付款方式:付款是所付款项的记录。
Payment: A "Payment" is a record of what payments have come in.
在此应用程序的以前实现中,生活更简单,我将所有这四个概念视为SQL数据库中的一个表:预约。一个预约可以有多个订单项,多个付款和一个发票。发票只是一个电子邮件或打印出来的,是从订单项和客户记录中生成的。
In a previous implementation of this application, life was simpler and I treated all four of these concepts as one table in a SQL database: "Appointment." One "Appointment" could have multiple line items, multiple payments, and one invoice. The invoice was just an e-mail or print out that was produced from the line items and customer record.
9次,共10次,这很好。当一个客户为一辆或几辆车预约一次,并自行付款时,一切都是盛大的。但是这个系统在很多条件下都没有工作。例如:
9 out of 10 times, this worked fine. When one customer made one appointment for one or a few vehicles and paid for it themselves, all was grand. But this system didn't work under a lot of conditions. For example:
- 当一位客户做了一个预约,但是这个预约中途已经下雨了一段时间,导致细节不得不回来了一天,我需要两个约会,但只有一个订单项,一个发票和一个付款。
- 当一个办公室的一群客户都决定在同一天完成他们的汽车要获得折扣,我需要一个约会,但是多个发票和多个付款。
- 当一个客户支付两个约会一个支票,我需要两个约会,但只有一个发票和一个付款。
我能够通过捏造一些东西来处理所有这些异常值。例如,如果一个细节工人必须在第二天回来,我将在第二天再次预约一个名为完成的订单项,成本为$ 0。或者如果我有一个客户支付两次约会,一次支票,我会在每个约会中分配付款记录。这样做的问题是它为数据一致性创造了巨大的机会。数据一致性可能是一个严重的问题,特别是涉及财务信息的情况,如第三个例子,客户支付两次约会一次支票。付款必须直接与提供的商品和服务相匹配,以便妥善追踪应收账款。
I was able to handle all of these outliers by fudging things a little. For example, if a detailer had to come back the next day, i'd just make another appointment on the second day with a line item that said "Finish Up" and the cost would be $0. Or if I had one customer pay for two appointments with one check, I'd put split payment records in each appointment. The problem with this is that it creates a huge opportunity for data in-congruency. Data in-congruency can be a serious problem especially for cases involving financial information such as the third exmaple where the customer paid for two appointments with one check. Payments must be matched up directly with goods and services rendered in order to properly keep track of accounts receivable.
下面是用于组织和存储此数据的归一化结构。也许是因为我缺乏经验,我非常重视数据归一化,因为它似乎是避免数据不协调错误的好方法。使用这种结构,可以通过一个操作对数据进行更改,而无需担心更新其他表。然而,读取可能需要多次读取与内存中的数据组织相结合。稍后我想,如果有性能问题,我可以添加一些非规范化字段到预约,以便更快的查询,同时保持安全的标准化结构的完整性。非规范化可能会减慢写入速度,但是我认为我可能能够对其他资源进行异步调用或添加到任务Que中,以便客户端不必等待更新数据的非规范化部分的额外写入。
Below, is a normalized structure for organizing and storing this data. Perhaps because of my inexperience, I place a lot of emphasis on data normalization because it seems like a great way to avoid data incongruity errors. With this structure, changes to the data can be done with one operation without having to worry about updating other tables. Reads, however, can require multiple reads coupled with in-memory organization of data. I figure later on, if there are performance issues, I can add some denormalized fields to "Appointment" for faster querying while keeping the "safe" normalized structure intact. Denormalization could potentially slow down writes, but I was thinking that I might be able to make asynchronous calls to other resources or add to the task que so that the client does not have to wait for the extra writes that update the denormalized portions of the data.
表格
Appointment
start_time
etc...
Invoice
due_date
etc...
Payment
invoice_Key_List
amount_paid
etc...
Line_Item
appointment_Key_List
invoice_Key
name
price
etc...
以下是将所有四个实体(表)绑定在一起的一系列查询和操作约会名单。这将包括有关每次任命安排的服务的信息,每个任用的每个任命的总费用,天气或不付款。在加载约会安排的日历或经理获取操作的整体视图时,这将是一个常见的查询。
The following is the series of queries and operations required to tie all four entities (tables) together for a given list of appointments. This would include information on what services were scheduled for each appointment, the total cost of each appointment and weather or not payment as been received for each appointment. This would be a common query when loading the calendar for appointment scheduling or for a manager to get an overall view of operations.
- QUERY for 开始时间字段位于给定范围之间的约会列表。
- 将返回的约会中的每个密钥添加到列表中。
- 将所有订单项中的每个invoice_key添加到集合集合中。 / li>
- Add each invoice_key from all of the line items into a Set collection.
- 将每个键从返回的发票添加到列表中
...如您所见,此操作需要4个数据存储查询以及一些内存中的组织(希望内存中的速度会相当快)
...As you can see, this operation requires 4 datastore queries as well as some in-memory organization (hopefully the in-memory will be pretty fast)
任何人都可以评论这个设计?这是我能想到的最好的,但我怀疑可能有更好的选择或完全不同的设计,我不认为这可能会更好的一般或特定在GAE(谷歌应用程序引擎)的优势,弱点和功能
Can anyone comment on this design? This is the best I could come up with, but I suspect there might be better options or completely different designs that I'm not thinking of that might work better in general or specifically under GAE's (google app engine) strengths, weaknesses, and capabilities.
谢谢!
大多数应用程序的阅读密集程度更高,有些则更易于编写。下面我将描述用户想要执行的典型用例和分解操作:
Most applications are more read-intensive, some are more write intensive. Below, I describe a typical use-case and break down operations that the user would want to perform:
管理员从客户获取电话:
- 阅读 - 管理员加载日历并查找可用的时间
- 写 - 管理员查询客户的信息,我将图片描述为随后的一系列异步读取,因为管理员输入每个信息,如电话号码,姓名,电子邮件,地址等...或者如果有必要,可能在客户端应用程序收集所有信息后最后写入,然后提交。
- 写入 - 经理取消客户的信用卡信息,并将其作为单独操作添加到记录中
- 写 - 管理员收取信用卡,并验证付款是否通过
- Read - Manager loads the calendar and looks for a time that is available
- Write - Manager queries customer for their information, I pictured this to be a succession of asynchronous reads as the manager enters each piece of information such as phone number, name, e-mail, address, etc... Or if necessary, perhaps one write at the end after the client application has gathered all of the information and it is then submitted.
- Write - Manager takes down customer's credit card info and adds it to their record as a separate operation
- Write - Manager charges credit card and verifies that the payment went through
经理拨出电话:
- 阅读管理员加载日历
- 阅读经理为他想要的客户加载约会致电
- 撰写经理点击通话按钮,发起呼叫,新的CallReacord实体写入
- 读呼叫服务器响应呼叫请求并读取CallRecord以了解如何处理呼叫
- 写呼叫服务器将更新信息写入CallRecord
- 在呼叫关闭时写入,呼叫服务器向服务器发出另一个请求来更新CallRecord资源(注意:此请求不是时间关键的) / li>
- Read Manager loads the calendar
- Read Manager loads the appointment for the customer he wants to call
- Write Manager clicks "Call" button, a call is initiated and a new CallReacord entity is written
- Read Call server responds to call request and reads CallRecord to find out how to handle the call
- Write Call server writes updated information to the CallRecord
- Write when call is closed, call server makes another request to the server to update the CallRecord resource (note: this request is not time-critical)
接受的答案::
两个顶级的两个答案都非常周到和赞赏。我接受了少数票,以尽可能地不均衡地平衡他们的曝光。Accepted answer::Both of the top two answers were very thoughtful and appreciated. I accepted the one with few votes in order to imperfectly equalize their exposure as much as possible.
推荐答案
你指定了两个具体的意见您的网站需要提供:
You specified two specific "views" your website needs to provide:
-
安排约会。您目前的方案应该能够正常工作,您只需要执行您提到的第一个查询。
Scheduling an appointment. Your current scheme should work just fine for this - you'll just need to do the first query you mentioned.
操作的总体视图。我不太确定这是什么,但如果您需要做上述四个查询的字符串来获得这一点,那么您的设计可以使用一些改进。详情如下。
Overall view of operations. I'm not really sure what this entails, but if you need to do the string of four queries you mentioned above to get this, then your design could use some improvement. Details below.
四个数据存储区查询本身并不一定是超量的。您的案例中的问题是其中两个查询是昂贵的,甚至可能不可能。我会通过每个查询:
Four datastore queries in and of itself isn't necessarily overboard. The problem in your case is that two of the queries are expensive and probably even impossible. I'll go through each query:
-
获取约会列表 - 没问题。此查询将能够扫描索引以在您指定的日期范围内有效地检索约会。
Getting a list of appointments - no problem. This query will be able to scan an index to efficiently retrieve the appointments in the date range you specify.
从#1获取每个约会的所有订单项- 这是个问题。此查询要求您执行
IN
查询。IN
查询转换为 - 所以你最终会从#1的每个预约密钥得到一个查询!这些将并行执行,这不是那么糟糕。主要问题是IN
查询仅限于一小部分值列表(最多只有30个值)。如果#1返回的预约密钥超过30个,则此查询将无法执行!Get all line items for each of appointment from #1 - this is a problem. This query requires that you do an
IN
query.IN
queries are transformed intoN
sub-queries behind the scenes - so you'll end up with one query per appointment key from #1! These will be executed in parallel so that isn't so bad. The main problem is thatIN
queries are limited to only a small list of values (up to just 30 values). If you have more than 30 appointment keys returned by #1 then this query will fail to execute!获取订单项引用的所有发票 - 没有问题。您是正确的这个查询是便宜的,因为您可以直接通过键直接获取所有相关的发票。 (注意:此查询仍然是同步的 - 我不认为异步是您要查找的字词)。
Get all invoices referenced by line items - no problem. You are correct that this query is cheap because you can simply fetch all of the relevant invoices directly by key. (Note: this query is still synchronous - I don't think asynchronous was the word you were looking for).
获取所有发票返还的所有付款#3 - 这是一个问题。像#2一样,这个查询将是一个
IN
查询,如果#3返回中等数量的您需要提取付款的发票,则会失败。Get all payments for all invoices returned by #3 - this is a problem. Like #2, this query will be an
IN
query and will fail if #3 returns even a moderate number of invoices which you need to fetch payments for.如果#1和#3返回的项目数量足够小,那么GAE几乎肯定会是
If the number of items returned by #1 and #3 are small enough, then GAE will almost certainly be able to do this within the allowed limits. And that should be good enough for your personal needs - it sounds like you mostly need it to work, and don't need to it to scale to huge numbers of users (it won't).
改进建议:
- 非规范化!尝试存储
Line_Item
发票和付款
与列表中的给定约会相关的实体在约会本身。那么你可以消除你的IN
查询。确保这些新的ListProperty
被索引,以避免
- Denormalization! Try storing the keys for
Line_Item
,Invoice
, andPayment
entities relevant to a given appointment in lists on the appointment itself. Then you can eliminate yourIN
queries. Make sure these newListProperty
are not indexed to avoid problems with exploding indices
其他不太具体的想法要改进:
Other less specific ideas for improvement:
- 根据您的整体操作视图的显示方式,您可能会拆分检索所有这些信息。例如,您可能首先显示一个约会列表,然后当经理想要更多关于特定约会的信息时,您将继续查阅与该约会相关的信息。你甚至可以通过AJAX来实现这一点,如果你在一个页面上进行这种互动。
- Memcache是你的朋友 - 用它来缓存数据存储查询的结果(甚至更高级别的结果),以便您不必在每次访问时从头重新计算。
这篇关于数据库设计 - 谷歌应用引擎的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!