犯的数据库开发错误

犯的数据库开发错误

本文介绍了应用程序开发人员犯的数据库开发错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

应用程序开发人员常犯的数据库开发错误有哪些?

What are common database development mistakes made by application developers?

推荐答案

1.未使用适当的索引

这是一个相对容易的事情,但它仍然一直在发生.外键应该有索引.如果你在 WHERE 中使用一个字段,你应该(可能)有一个索引.根据您需要执行的查询,此类索引通常应涵盖多个列.

This is a relatively easy one but still it happens all the time. Foreign keys should have indexes on them. If you're using a field in a WHERE you should (probably) have an index on it. Such indexes should often cover multiple columns based on the queries you need to execute.

2.不强制执行参照完整性

您的数据库可能在这里有所不同,但如果您的数据库支持参照完整性——这意味着所有外键都保证指向一个存在的实体——你应该使用它.

Your database may vary here but if your database supports referential integrity--meaning that all foreign keys are guaranteed to point to an entity that exists--you should be using it.

在 MySQL 数据库上看到这种失败是很常见的.我不相信 MyISAM 支持它.InnoDB 确实如此.你会发现有人在使用 MyISAM 或那些正在使用 InnoDB 但无论如何都没有使用它的人.

It's quite common to see this failure on MySQL databases. I don't believe MyISAM supports it. InnoDB does. You'll find people who are using MyISAM or those that are using InnoDB but aren't using it anyway.

更多信息:

3.使用自然主键而不是代理(技术)主键

自然键是基于外部有意义的数据的键,这些数据(表面上)是唯一的.常见的例子是产品代码、两个字母的州代码(美国)、社会安全号码等.代理或技术主键是那些在系统之外完全没有意义的键.它们的发明纯粹是为了识别实体,通常是自动递增的字段(SQL Server、MySQL 等)或序列(最显着的是 Oracle).

Natural keys are keys based on externally meaningful data that is (ostensibly) unique. Common examples are product codes, two-letter state codes (US), social security numbers and so on. Surrogate or technical primary keys are those that have absolutely no meaning outside the system. They are invented purely for identifying the entity and are typically auto-incrementing fields (SQL Server, MySQL, others) or sequences (most notably Oracle).

在我看来,您应该始终使用代理键.这个问题出现在以下问题中:

In my opinion you should always use surrogate keys. This issue has come up in these questions:

这是一个有点争议的话题,你不会得到普遍的同意.虽然您可能会发现有些人认为自然键在某些情况下是可以的,但除了可以说是不必要的之外,您不会发现对代理键的任何批评.如果你问我,这是一个很小的缺点.

This is a somewhat controversial topic on which you won't get universal agreement. While you may find some people, who think natural keys are in some situations OK, you won't find any criticism of surrogate keys other than being arguably unnecessary. That's quite a small downside if you ask me.

请记住,即使是国家也可能不复存在(例如,南斯拉夫).

Remember, even countries can cease to exist (for example, Yugoslavia).

4.编写需要 DISTINCT 才能工作的查询

4. Writing queries that require DISTINCT to work

您经常在 ORM 生成的查询中看到这一点.查看 Hibernate 的日志输出,您将看到所有查询都以:

You often see this in ORM-generated queries. Look at the log output from Hibernate and you'll see all the queries begin with:

SELECT DISTINCT ...

这是确保您不会返回重复行从而获得重复对象的一种快捷方式.您有时也会看到人们这样做.如果你看到太多,那就是一个真正的危险信号.并不是 DISTINCT 不好或没有有效的应用程序.它确实(在这两个方面),但它不是编写正确查询的替代品或权宜之计.

This is a bit of a shortcut to ensuring you don't return duplicate rows and thus get duplicate objects. You'll sometimes see people doing this as well. If you see it too much it's a real red flag. Not that DISTINCT is bad or doesn't have valid applications. It does (on both counts) but it's not a surrogate or a stopgap for writing correct queries.

来自 为什么我讨厌 DISTINCT:

事情开始变得糟糕的地方意见是当开发人员构建大量查询,加入桌子在一起,突然他意识到它看起来就像他获取重复(甚至更多)行和他的即时反应……他的这个问题"的解决方案"是抛出 DISTINCT 关键字和 POOF他所有的烦恼都消失了.

5.倾向于聚合而不是联接

数据库应用程序开发人员的另一个常见错误是没有意识到与连接相比,聚合(即 GROUP BY 子句)的开销要高多少.

Another common mistake by database application developers is to not realize how much more expensive aggregation (ie the GROUP BY clause) can be compared to joins.

为了让您了解这有多普遍,我已经在这里多次写过这个话题,但因为它而受到很多反对.例如:

To give you an idea of how widespread this is, I've written on this topic several times here and been downvoted a lot for it. For example:

来自 SQL 语句 - join" vs 分组和拥有":

第一次查询:

SELECT userid
FROM userrole
WHERE roleid IN (1, 2, 3)
GROUP by userid
HAVING COUNT(1) = 3

查询时间:0.312 s

Query time: 0.312 s

第二次查询:

SELECT t1.userid
FROM userrole t1
JOIN userrole t2 ON t1.userid = t2.userid AND t2.roleid = 2
JOIN userrole t3 ON t2.userid = t3.userid AND t3.roleid = 3
AND t1.roleid = 1

查询时间:0.016 s

Query time: 0.016 s

没错.加入版本我建议比快 20 倍聚合版本.

That's right. The join version I proposed is twenty times faster than the aggregate version.

6.不通过视图简化复杂的查询

并非所有数据库供应商都支持视图,但对于那些支持视图的供应商,如果使用得当,它们可以大大简化查询.例如,在一个项目中,我为 CRM 使用了通用派对模型.这是一种极其强大且灵活的建模技术,但会导致许多连接.在这个模型中有:

Not all database vendors support views but for those that do, they can greatly simplify queries if used judiciously. For example, on one project I used a generic Party model for CRM. This is an extremely powerful and flexible modelling technique but can lead to many joins. In this model there were:

  • 派对:人和组织;
  • 当事人角色:当事人所做的事情,例如员工和雇主;
  • 参与方角色关系:这些角色如何相互关联.
  • Party: people and organisations;
  • Party Role: things those parties did, for example Employee and Employer;
  • Party Role Relationship: how those roles related to each other.

示例:

  • Ted 是一个 Person,是 Party 的一个子类型;
  • Ted 有许多角色,其中之一是员工;
  • 英特尔是一个组织,是派对的一个子类型;
  • 英特尔有多种角色,其中之一是雇主;
  • 英特尔聘用了 Ted,这意味着他们各自的角色之间存在关系.

所以有五个表连接起来将 Ted 与他的雇主联系起来.您假设所有员工都是个人(而非组织)并提供此辅助视图:

So there are five tables joined to link Ted to his employer. You assume all employees are Persons (not organisations) and provide this helper view:

CREATE VIEW vw_employee AS
SELECT p.title, p.given_names, p.surname, p.date_of_birth, p2.party_name employer_name
FROM person p
JOIN party py ON py.id = p.id
JOIN party_role child ON p.id = child.party_id
JOIN party_role_relationship prr ON child.id = prr.child_id AND prr.type = 'EMPLOYMENT'
JOIN party_role parent ON parent.id = prr.parent_id = parent.id
JOIN party p2 ON parent.party_id = p2.id

突然之间,您对所需数据有了一个非常简单的视图,但使用了高度灵活的数据模型.

And suddenly you have a very simple view of the data you want but on a highly flexible data model.

7.不清理输入

这是一个巨大的.现在我喜欢 PHP,但如果您不知道自己在做什么,那么创建容易受到攻击的站点真的很容易.没有什么比小鲍比桌的故事更能概括它的了.

This is a huge one. Now I like PHP but if you don't know what you're doing it's really easy to create sites vulnerable to attack. Nothing sums it up better than the story of little Bobby Tables.

用户通过 URL、表单数据和 cookie 提供的数据应始终被视为恶意和经过消毒的数据.确保您得到了您所期望的.

Data provided by the user by way of URLs, form data and cookies should always be treated as hostile and sanitized. Make sure you're getting what you expect.

8.不使用准备好的语句

准备好的语句是当您编译查询时减去插入、更新和 WHERE 子句中使用的数据,然后再提供这些数据.例如:

Prepared statements are when you compile a query minus the data used in inserts, updates and WHERE clauses and then supply that later. For example:

SELECT * FROM users WHERE username = 'bob'

对比

SELECT * FROM users WHERE username = ?

SELECT * FROM users WHERE username = :username

取决于您的平台.

我已经看到数据库因这样做而瘫痪.基本上,任何现代数据库每次遇到新查询时都必须对其进行编译.如果它遇到一个以前见过的查询,你就给了数据库缓存编译查询和执行计划的机会.通过执行大量查询,您可以让数据库有机会找出问题并进行相应优化(例如,将编译后的查询固定在内存中).

I've seen databases brought to their knees by doing this. Basically, each time any modern database encounters a new query it has to compile it. If it encounters a query it's seen before, you're giving the database the opportunity to cache the compiled query and the execution plan. By doing the query a lot you're giving the database the opportunity to figure that out and optimize accordingly (for example, by pinning the compiled query in memory).

使用准备好的语句还可以为您提供有关使用某些查询的频率的有意义的统计数据.

Using prepared statements will also give you meaningful statistics about how often certain queries are used.

准备好的语句还可以更好地保护您免受 SQL 注入攻击.

Prepared statements will also better protect you against SQL injection attacks.

9.标准化不够

数据库规范化 基本上是优化数据库设计或将数据组织成的过程表.

Database normalization is basically the process of optimizing database design or how you organize your data into tables.

就在本周,我遇到了一些代码,其中有人内爆了一个数组并将其插入到数据库中的单个字段中.规范化是将该数组的元素视为子表中的单独行(即一对多关系).

Just this week I ran across some code where someone had imploded an array and inserted it into a single field in a database. Normalizing that would be to treat element of that array as a separate row in a child table (ie a one-to-many relationship).

这也出现在 存储列表的最佳方法用户 ID 的数量:

我在其他系统中看到列表存储在序列化的 PHP 数组中.

但规范化的缺乏有多种形式.

But lack of normalization comes in many forms.

更多:

10.标准化太多

这似乎与之前的观点相矛盾,但与许多事情一样,规范化是一种工具.它是达到目的的手段,而不是目的本身.我认为许多开发人员忘记了这一点,并开始将手段"视为目的".单元测试就是一个很好的例子.

This may seem like a contradiction to the previous point but normalization, like many things, is a tool. It is a means to an end and not an end in and of itself. I think many developers forget this and start treating a "means" as an "end". Unit testing is a prime example of this.

我曾经在一个系统上工作过,该系统为客户提供了一个巨大的层次结构,类似于:

I once worked on a system that had a huge hierarchy for clients that went something like:

Licensee ->  Dealer Group -> Company -> Practice -> ...

这样您就必须将大约 11 个表连接在一起才能获得任何有意义的数据.这是规范化过度的一个很好的例子.

such that you had to join about 11 tables together before you could get any meaningful data. It was a good example of normalization taken too far.

更重要的是,仔细考虑非规范化可以带来巨大的性能优势,但在执行此操作时必须非常小心.

More to the point, careful and considered denormalization can have huge performance benefits but you have to be really careful when doing this.

更多:

11.使用独占弧

独占弧是一个常见的错误,其中一个表是用两个或多个外键创建的,其中一个并且只有一个可以是非空的.大错误.一方面,保持数据完整性变得更加困难.毕竟,即使具有参照完整性,也没有什么可以阻止设置两个或多个这些外键(尽管存在复杂的检查约束).

An exclusive arc is a common mistake where a table is created with two or more foreign keys where one and only one of them can be non-null. Big mistake. For one thing it becomes that much harder to maintain data integrity. After all, even with referential integrity, nothing is preventing two or more of these foreign keys from being set (complex check constraints notwithstanding).

来自

Pragmatism reigns supreme, particularly in the database world. If you're sticking to principles to the point that they've become a dogma then you've quite probably made mistakes. Take the example of the aggregate queries from above. The aggregate version might look "nice" but its performance is woeful. A performance comparison should've ended the debate (but it didn't) but more to the point: spouting such ill-informed views in the first place is ignorant, even dangerous.

13.过度依赖 UNION ALL,尤其是 UNION 结构

SQL 术语中的 UNION 只是连接一致的数据集,这意味着它们具有相同的类型和列数.它们之间的区别在于 UNION ALL 是一个简单的连接,应该尽可能地首选,而 UNION 将隐式执行 DISTINCT 以删除重复的元组.

A UNION in SQL terms merely concatenates congruent data sets, meaning they have the same type and number of columns. The difference between them is that UNION ALL is a simple concatenation and should be preferred wherever possible whereas a UNION will implicitly do a DISTINCT to remove duplicate tuples.

UNION 和 DISTINCT 一样,都有自己的位置.有有效的应用程序.但是如果你发现自己做了很多事情,特别是在子查询中,那么你可能做错了.这可能是查询构造不当或数据模型设计不当迫使您执行此类操作的情况.

UNIONs, like DISTINCT, have their place. There are valid applications. But if you find yourself doing a lot of them, particularly in subqueries, then you're probably doing something wrong. That might be a case of poor query construction or a poorly designed data model forcing you to do such things.

UNION,尤其是在连接或依赖子查询中使用时,会削弱数据库.尽量避免它们.

UNIONs, particularly when used in joins or dependent subqueries, can cripple a database. Try to avoid them whenever possible.

14.在查询中使用 OR 条件

这似乎无害.毕竟,AND 是可以的.或者应该也可以吧?错误的.基本上,AND 条件限制数据集,而 OR 条件增长数据集,但不是以适合优化的方式.特别是当不同的 OR 条件可能相交从而迫使优化器有效地对结果进行 DISTINCT 操作时.

This might seem harmless. After all, ANDs are OK. OR should be OK too right? Wrong. Basically an AND condition restricts the data set whereas an OR condition grows it but not in a way that lends itself to optimisation. Particularly when the different OR conditions might intersect thus forcing the optimizer to effectively to a DISTINCT operation on the result.

不好:

... WHERE a = 2 OR a = 5 OR a = 11

更好:

... WHERE a IN (2, 5, 11)

现在您的 SQL 优化器可以有效地将第一个查询转换为第二个查询.但它可能不会.只是不要这样做.

Now your SQL optimizer may effectively turn the first query into the second. But it might not. Just don't do it.

15.没有设计他们的数据模型以使其适用于高性能解决方案

这是一个很难量化的点.它通常是通过其效果观察到的.如果您发现自己为相对简单的任务编写了粗糙的查询,或者用于查找相对简单信息的查询效率不高,那么您的数据模型可能很差.

This is a hard point to quantify. It is typically observed by its effect. If you find yourself writing gnarly queries for relatively simple tasks or that queries for finding out relatively straightforward information are not efficient, then you probably have a poor data model.

在某些方面,这一点总结了所有较早的内容,但它更像是一个警示故事,即执行查询优化之类的事情通常首先完成,而应该在第二次完成.首先,在尝试优化性能之前,您应该确保拥有良好的数据模型.正如 Knuth 所说:

In some ways this point summarizes all the earlier ones but it's more of a cautionary tale that doing things like query optimisation is often done first when it should be done second. First and foremost you should ensure you have a good data model before trying to optimize the performance. As Knuth said:

过早优化是万恶之源

16.数据库事务的错误使用

特定进程的所有数据更改都应该是原子的.IE.如果操作成功,它会完全执行此操作.如果失败,则数据保持不变.- 不应该有半完成"的变化.

All data changes for a specific process should be atomic. I.e. If the operation succeeds, it does so fully. If it fails, the data is left unchanged. - There should be no possibility of 'half-done' changes.

理想情况下,实现这一点的最简单方法是整个系统设计应努力通过单个 INSERT/UPDATE/DELETE 语句支持所有数据更改.在这种情况下,不需要特殊的事务处理,因为您的数据库引擎应该自动执行此操作.

Ideally, the simplest way to achieve this is that the entire system design should strive to support all data changes through single INSERT/UPDATE/DELETE statements. In this case, no special transaction handling is needed, as your database engine should do so automatically.

但是,如果任何进程确实需要将多个语句作为一个单元来执行以保持数据处于一致状态,那么适当的事务控制是必要的.

However, if any processes do require multiple statements be performed as a unit to keep the data in a consistent state, then appropriate Transaction Control is necessary.

还建议密切关注您的数据库连接层和数据库引擎在这方面如何交互的微妙之处.

Also recommended to pay careful attention to the subtelties of how your database connectivity layer, and database engine interact in this regard.

17.不理解基于集合"的范式

SQL 语言遵循适用于特定类型问题的特定范式.尽管有各种特定于供应商的扩展,但该语言仍难以处理 Java、C#、Delphi 等语言中微不足道的问题.

The SQL language follows a specific paradigm suited to specific kinds of problems. Various vendor-specific extensions notwithstanding, the language struggles to deal with problems that are trivial in langues like Java, C#, Delphi etc.

这种缺乏理解表现在几个方面.

This lack of understanding manifests itself in a few ways.

明确责任分工,力求用合适的工具解决每一个问题.

Determine clear division of responsibility, and strive to use the appropriate tool to solve each problem.

这篇关于应用程序开发人员犯的数据库开发错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!