本文介绍了如何标记一个大集“传递组”与约束?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

@NealB解决方案后编辑:@ NealB的解决方案是非常非常快的comparated与任何另外一个和免除有关添加约束,以提高性能这个新问题。在@ NealB不是需要任何改善,已经的 O(N)的时间,是非常简单的。

EDIT after @NealB solution: the @NealB's solution is very very fast comparated with any another one, and dispenses this new question about "add a constraint to improve performance". The @NealB's not need any improve, have O(n) time and is very simple.

的问题标签传递组,SQL 拥有的一个优雅的解决方案。但是这种解决方案会消耗指数时间(!)。我需要10000 itens工作:1000 itens需要1秒,与2000年需要一天的时间...

The problem of "label transitive groups with SQL" have an elegant solution using recursion and CTE... But this solution consumes an exponential time (!). I need to work with 10000 itens: with 1000 itens need 1 second, with 2000 need 1 day...

约束的:在我的情况可以把问题分解成块〜100 itens或者更少,但只能选择一个组〜10 itens,并放弃所有其他的〜90标itens ......

Constraint: in my case is possible to break the problem into pieces of ~100 itens or less, but only to select one group of ~10 itens, and discard all the other ~90 labeled itens...

有一个通用的algotithm添加和使用这种pre选择,减少了二次, O(N ^ 2)的时间?或许,作为显示用的意见和@wildplasser,一个的 O(N日志(N))的时间;但我相信,以pre选择,以减少的 O(N)的时间。

There are a generic algotithm to add and use this kind of "pre-selection", to reduce the quadratic, O(N^2), time? Perhaps, as showed by comments and @wildplasser, a O(N log(N)) time; but I expect, with "pre-selection" to reduce to O(N) time.

(编辑)

我尝试使用替代算法,但它需要一些改进的解决方案在这里使用;或者,要真正提高性能(以 O(N)的时间),需要使用pre选择。

I try to use alternative algorithm, but it need some improvement to use as solution here; or, to really increase performance (to O(N) time), need to use "pre-selection".

的$ P $对 - 选择(约束)是基于一种超集分组...由原始的问题 T1 表,

The "pre-selection" (constraint) is based on a "super-set grouping"... Stating by the original "How to label 'transitive groups' with SQL?" question t1 table,

  table T1
  (original T1 augmented by "super-set grouping label" ssg, and more one row)
  ID1 | ID2 | ssg
  1   | 2   | 1
  1   | 5   | 1
  4   | 7   | 1
  7   | 8   | 1
  9   | 1   | 1
  10  | 11  | 2

因此​​,有三组,

So there are three groups,

  • G1 :{1,2,5,9},因为1的 T 的2,1的 T 5和9的 T 的1
  • G2 :{4,7,8},因为4的 T 的7和7的 T 的8
  • G3 :{10,11},因为10的 T 的11
  • g1: {1,2,5,9} because "1 t 2", "1 t 5" and "9 t 1"
  • g2: {4,7,8} because "4 t 7" and "7 t 8"
  • g3: {10,11} because "10 t 11"

超级组只是一个辅助的分组,

The super-group is only a auxiliary grouping,

  • ssg1 :{G1,G2}
  • ssg2 :{G3}
  • ssg1: {g1,g2}
  • ssg2: {g3}

如果我们的 M 的超级组项和的 N 的总 T1 项,平均集团长度少艾德里安N / M。我们可以假设(对我典型的问题),而且要 SSG 最大长度为〜N / M。

If we have M super-group-items and N total T1 items, the average group length will be less tham N/M. We can suppose (for my typical problem) also that ssg maximum length is ~N/M.

因此​​, 的标签算法需要用〜N / M只运行M倍项如果使用 SSG 约束。

So, the "label algorithm" need to run only M times with ~N/M items if it use the ssg constraint.

推荐答案

这是SQL只soulution似乎是有点问题在这里。随着一些程序的帮助在SQL之上编程解决方案似乎是failry简单高效。下面是一个简要概述的解决方案,可以使用任何过程语言调用SQL来实现。

An SQL only soulution appears to be a bit of a problem here. With the help of some proceduralprogramming on top of SQL the solution appears to be failry simple and efficient. Here is a brief outlineof a solution as could be implemented using any procedural language invoking SQL.

声明表研究与主键 ID ,其中 ID 对应同一个域中 ID1 ID2 表中的 T1 。表研究包含一个其他非键列,一个标签

Declare table R with primary key ID where ID corresponds the same domain as ID1 and ID2 of table T1.Table R contains one other non-key column, a Label number

填写表格研究 T1 中值的范围。设置标签为零(无标签)。使用您的数据。例如,初始设置为研究看起来像:

Populate table R with the range of values found in T1. Set Label to zero (no label).Using your example data, the initial setup for R would look like:

Table R
ID Label
== =====
 1 0
 2 0
 4 0
 5 0
 7 0
 8 0
 9 0

使用宿主语言光标加上辅助计数器,读 T1 的每一行。查找 ID1 ID2 研究。你会发现之一四种情况:

Using a host language cursor plus an auxiliary counter, read each row from T1. Lookup ID1 and ID2 in R. You will find one offour cases:

 Case 1: ID1.Label == 0 and ID2.Label == 0

在这种情况下,无论这些 ID 取值已经看到之前之一:加1到柜台,然后更新两个行研究到计数器的值:更新R SET R.Label =:柜台里R.ID在(:ID1,:ID2)

In this case neither one of these IDs have been "seen" before: Add 1 to the counter and then update bothrows of R to the value of the counter: update R set R.Label = :counter where R.ID in (:ID1, :ID2)

 Case 2: ID1.Label == 0 and ID2.Label <> 0

在这种情况下, ID1 是新的,但 ID2 已分配的标签。 ID1 需要被分配到相同的标签为 ID2 更新R SET R.Lablel =:ID2.Label其中,R.ID =:ID1

In this case, ID1 is new but ID2 has already been assigned a label. ID1 needs to be assigned to thesame label as ID2: update R set R.Lablel = :ID2.Label where R.ID = :ID1

 Case 3: ID1.Label <> 0 and ID2.Label == 0

在这种情况下, ID2 是新的,但 ID1 已分配的标签。 ID2 需要被分配到相同的标签为 ID1 更新R SET R.Lablel =:ID1.Label其中,R.ID =:ID2

In this case, ID2 is new but ID1 has already been assigned a label. ID2 needs to be assigned to thesame label as ID1: update R set R.Lablel = :ID1.Label where R.ID = :ID2

 Case 4: ID1.Label <> 0 and ID2.Label <> 0

在此情况下,该行包含冗余信息。两排研究应包含相同的标签值。如果不,有某种形式的数据完整性问题。哈啊...不太看到编辑......

In this case, the row contains redundant information. Both rows of R should contain the same Label value. If not,there is some sort of data integrity problem. Ahhhh... not quite see edit...

修改我才意识到,有其中两个标签值这里可能是非零和不同的情况。如果两者都非零和不同,那么需要两个标签集团在这一点上要合并。所有你需要做的是选择一个标签和更新等相匹配的东西,如:更新R SET R.Label为ID1.Label其中R .Label = ID2.Label 。现在,这两个集团已合并到同一个标签值。

EDIT I just realized that there are situations where both Label values here could be non-zero and different. If both are non-zero and different then two Label groups need to be merged at this point. All you need to do is choose one Label and update the others to match with something like: update R set R.Label to ID1.Label where R.Label = ID2.Label. Now both groups have been merged with the same Label value.

当光标完成后,表研究将包含需要更新 T2 标签值。

Upon completion of the cursor, table R will contain Label values needed to update T2.

Table R
ID Label
== =====
 1 1
 2 1
 4 2
 5 1
 7 2
 8 2
 9 1

进程表 T2 使用线沿线的东西:设置T2.Label为R.Label,其中T2.ID1 = R.ID 。最终的结果应该是:

Process table T2using something along the lines of: set T2.Label to R.Label where T2.ID1 = R.ID. The end result should be:

  table T2
  ID1 | ID2 | LABEL
  1   | 2   | 1
  1   | 5   | 1
  4   | 7   | 2
  7   | 8   | 2
  9   | 1   | 1

这个过程是puerly迭代的,应扩展到相当大的表没有困难。

This process is puerly iterative and should scale to fairly large tables without difficulty.

这篇关于如何标记一个大集“传递组”与约束?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-21 14:21