本文介绍了更新关系数据的算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

已知有什么算法在存在数据库约束的情况下执行通过插入,更新和删除行来更新数据库的任务?



更具体地说,在要删除的行的图像,要插入的行的图像和要更新的行的两个图像在存储器中之前。行可能适用于多个表。确切的更新顺序是未知的或未被保存 - 只有数据库最终必须反映的前图像和后图像是已知的。



数据库包含主键,外键和唯一索引约束。问题是找到使数据库更新的命令序列。为了简单起见,我愿意指定一行的主键永远不会被修改。



数据库系统不支持延迟约束检查。 (这种数据库的解决方案很简单)。我还必须规定主键列在插入之后不能被更新,并且不允许删除行并用相同的主键重新插入一行,即使一些算法可能会发现它是方便的。 (这对于所有主键都由数据库系统自动生成的常见情况是必需的)



有什么算法:


  1. 假设外键约束必须始终强制执行,但不使用唯一索引。

  2. 假设外键约束和唯一索引约束必须


  3. 我要求两者,因为我认为#1可能会相当简单。



    编辑:这里的目标是解决在一般(或几乎通用的)方式缺乏延迟约束检查的问题。我认为高质量的ORM包必须这样做。
    我想要一个对算法的解释,由您在这里或外部在学术论文等提供。我不会考虑指向软件包或源代码的答案这个问题的答案。
    / p>

    NAIVE算法:



    循环显示表格,并为添加,更改或删除的每一行生成单个INSERT,UPDATE或DELETE语句,分别为每个。
    循环生成的语句并应用于数据库。如果语句无法应用,请继续其他语句。
    重试失败的语句。继续迭代,直到没有更多的故障或者通过成功执行没有语句。
    如果语句仍然存在,尝试临时调整有问题的列中的数据,以尝试让它们成功。



    这是一个丑陋的强力算法和算法出临时调整部分是自己的挑战。所以,我想要一个改进和完整的算法。



    EDIT2:



    RBarryYoung张贴答案(但没有雪茄)充分解决情境#1,同时解决最常见的情况#2问题。以下是我在应用程序中经常看到的但尚未到达解决方案的场景#1更新模式的示例。 DELETE / UPDATE-INSERT在场景#1中很多时候是正确的,但诀窍是要弄清楚何时偏离它。我也怀疑偏离它放大了每个情景#2的UNIQUE问题的发生,可能会增加我对解决情境#2的兴趣。



    请注意,也不修改任何主键。但是,父对象的外键被修改。

      CREATE TABLE A 

    AId INT NOT NULL PRIMARY KEY


    CREATE TABLE B

    BId INT NOT NULL PRIMARY KEY,
    AI d INT NOT NULL FOREIGN KEY REFERENCES A(AId)


    CREATE TABLE C

    CId INT NOT NULL PRIMARY KEY,
    AI d INT NOT NULL FOREIGN KEY REFERENCES A(AId),
    BI d INT NOT NULL FOREIGN KEY REFERENCES B(BId)

      A(1)
    B(1,1)
    C

    After Images:

      A(1)
    B(2,1)[待删除:(1,1)]
    C(1,1,2)

    排序顺序:A,B,C



    第一个命令是DELETE请注意,如果C中的第三列允许NULL(它不在C(1,1,1)中),则C(1,1)会失败。



    解决方案

    为什么你甚至试图这样做?正确的方法是让数据库引擎推迟对约束的检查,直到事务被提交。



    你提出的问题在一般情况下是难以解决的案件。如果你只考虑在你要在数据库中更新的行中的外键的传递闭包,那么只有在图形描述树的时候才能解决这个问题。如果图中有一个循环,并且可以通过用NULL替换外键值来中断循环,那么可以重写一个SQL并添加另一个SQL以稍后更新列。如果你不能用一个NULL替换一个键值,那么它不能被解决。



    正如我所说,正确的方法是关闭约束,直到所有的SQL都已运行,然后重新启动它们进行提交。如果不满足约束,则提交将失败。 Postgres(例如)有一个功能,这使这很容易。


    What algorithms are known to perform the task of updating a database by inserting, updating, and deleting rows in the presence of database constraints?

    More specifically, say that before images of rows to be deleted, after images of rows to be inserted, and both images of rows to be updated are in memory. The rows might be for several tables. An exact sequence of updates is either not known or has not been preserved -- only the before images and the after images that the database must ultimately be made to reflect are known.

    The database contains primary key, foreign key, and unique index constraints. The problem is to find the sequence of commands to bring the database up to date. For simplicity, I am willing to specify that the primary key of a row will never be modified.

    The database system does not support deferred constraint checking. (The solution is trivial for such databases). I must also make the rule that primary key columns may not be updated after insert and it is not allowed to delete a row and re-insert one with the same primary key, even though some algorithms might otherwise find it convenient to do so. (This is necessary for common scenarios where all primary keys are autogenerated by the database system.)

    What are the algorithms:

    1. Assuming that foreign key constraints must be enforced at all times but unique indexes are not used.
    2. Assuming that foreign key constraints and unique index constraints must be enforced at all times.

    I ask for both because I think #1 might be considerably simpler.

    EDIT: The goal here is to solve the problem of lack of deferred constraint checking in a general (or nearly general purpose) way. I suppose high-quality ORM packages must do this.I want an explanation of the algorithms, either provided by you here or externally in an academic paper, etc. I will not consider a pointer to a software package or source code an answer to this question.

    NAIVE ALGORITHM:

    Loop through the tables and for each row that is added, changed, or deleted generate a single INSERT, UPDATE, or DELETE statement, respectively for each.Loop through the generated statements and apply to the database. If a statement fails to apply, continue with the other statements.Retry the statements that have failed. Keep iterating until there are no more failures or a pass successfully executes no statements.If statements remain, attempt temporary tweaking of the data in the problematic columns to try to get them to succeed.

    This is an ugly brute force algorithm and figuring out the "temporary tweaking" part is a challenge of its own. So, I would like an improved and complete algorithm.

    EDIT2:

    RBarryYoung posts an answer which gets close (but no cigar) to fully solving scenario #1 while solving the most common scenario #2 problems simultaneously. The following is an example of a scenario #1 update pattern I have seen very often in applications but have not yet arrived at the solution. DELETE/UPDATE-INSERT is exactly right a lot of the time in scenario #1, but the trick is to figure out when to deviate from it. I also suspect that deviating from it magnifies the occurence of UNIQUE problems per scenario #2, possibly increasing my interest in solving scenario #2 as well.

    Notice that there are no cycles nor are any primary keys modified. However, a foreign key to a parent is modified.

    CREATE TABLE A
    (
        AId INT NOT NULL PRIMARY KEY
    )
    
    CREATE TABLE B
    (
        BId INT NOT NULL PRIMARY KEY,
        AId INT NOT NULL FOREIGN KEY REFERENCES A (AId)
    )
    
    CREATE TABLE C
    (
        CId INT NOT NULL PRIMARY KEY,
        AId INT NOT NULL FOREIGN KEY REFERENCES A (AId),
        BId INT NOT NULL FOREIGN KEY REFERENCES B (BId)
    )
    

    Before Images:

    A (1)
    B (1,1)
    C (1,1,1)
    

    After Images:

    A (1)
    B (2,1) [To be deleted: (1,1)]
    C (1,1,2)
    

    Sort Order: A,B,C

    The first command is DELETE B (1,1) which fails due to C (1,1,1).

    Note that if the third column in C allows NULL (which it does not in this case), a pure solution might involve NULLing it out in an early pass as this would allow the given algorithm to proceed normally and have its usual advantages for dealing with most of the scenario #2 problems. An excellent solution to this problem needs to take things like this into account as well. The full generality of this question is undoubtedly a fascinating problem.

    解决方案

    Why are you even trying to do this? The correct way to do it is to get the database engine to defer the checking of the constraints until the transaction is committed.

    The problem that you pose is intractable in the general case. If you consider just a transitive closure of the foreign keys in the rows you want to update in database then it is only possible to solve this where the graph describes a tree. If there is a cycle in the graph and you can break the cycle by replacing a foreign key value with a NULL then you can re-write one SQL and add another to later update the column. If you can't replace a key value with a NULL then it can't be solved.

    As I say, the correct way to do this is to turn off the constraints until all of the SQL has been run and then turn them back on for the commit. The commit will fail if the constraints aren't met. Postgres (for example) has a feature which makes this very easy.

    这篇关于更新关系数据的算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 05:03
查看更多