问题描述
@NealB解决方案后编辑:@ NealB的解决方案是非常非常快的comparated与任何另外一个和免除有关添加约束,以提高性能这个新问题。在@ NealB不是需要任何改善,已经的 O(N)的时间,是非常简单的。
EDIT after @NealB solution: the @NealB's solution is very very fast comparated with any another one, and dispenses this new question about "add a constraint to improve performance". The @NealB's not need any improve, have O(n) time and is very simple.
的问题标签传递组,SQL 拥有的一个优雅的解决方案。但是这种解决方案会消耗指数时间(!)。我需要10000 itens工作:1000 itens需要1秒,与2000年需要一天的时间...
The problem of "label transitive groups with SQL" have an elegant solution using recursion and CTE... But this solution consumes an exponential time (!). I need to work with 10000 itens: with 1000 itens need 1 second, with 2000 need 1 day...
约束的:在我的情况可以把问题分解成块〜100 itens或者更少,但只能选择一个组〜10 itens,并放弃所有其他的〜90标itens ......
Constraint: in my case is possible to break the problem into pieces of ~100 itens or less, but only to select one group of ~10 itens, and discard all the other ~90 labeled itens...
有一个通用的algotithm添加和使用这种pre选择,减少了二次, O(N ^ 2)的时间?或许,作为显示用的意见和@wildplasser,一个的 O(N日志(N))的时间;但我相信,以pre选择,以减少的 O(N)的时间。
There are a generic algotithm to add and use this kind of "pre-selection", to reduce the quadratic, O(N^2), time? Perhaps, as showed by comments and @wildplasser, a O(N log(N)) time; but I expect, with "pre-selection" to reduce to O(N) time.
(编辑)
我尝试使用替代算法,但它需要一些改进的解决方案在这里使用;或者,要真正提高性能(以 O(N)的时间),需要使用pre选择。
I try to use alternative algorithm, but it need some improvement to use as solution here; or, to really increase performance (to O(N) time), need to use "pre-selection".
的$ P $对 - 选择(约束)是基于一种超集分组...由原始的问题 T1
表,
The "pre-selection" (constraint) is based on a "super-set grouping"... Stating by the original "How to label 'transitive groups' with SQL?" question t1
table,
table T1
(original T1 augmented by "super-set grouping label" ssg, and more one row)
ID1 | ID2 | ssg
1 | 2 | 1
1 | 5 | 1
4 | 7 | 1
7 | 8 | 1
9 | 1 | 1
10 | 11 | 2
因此,有三组,
So there are three groups,
-
G1
:{1,2,5,9},因为1的 T 的2,1的 T 5和9的 T 的1 -
G2
:{4,7,8},因为4的 T 的7和7的 T 的8 -
G3
:{10,11},因为10的 T 的11
g1
: {1,2,5,9} because "1 t 2", "1 t 5" and "9 t 1"g2
: {4,7,8} because "4 t 7" and "7 t 8"g3
: {10,11} because "10 t 11"
超级组只是一个辅助的分组,
The super-group is only a auxiliary grouping,
-
ssg1
:{G1,G2} -
ssg2
:{G3}
ssg1
: {g1,g2}ssg2
: {g3}
如果我们的 M 的超级组项和的 N 的总 T1
项,平均集团长度少艾德里安N / M。我们可以假设(对我典型的问题),而且要 SSG
最大长度为〜N / M。
If we have M super-group-items and N total T1
items, the average group length will be less tham N/M. We can suppose (for my typical problem) also that ssg
maximum length is ~N/M.
因此, 的标签算法需要用〜N / M只运行M倍项如果使用 SSG
约束。
So, the "label algorithm" need to run only M times with ~N/M items if it use the ssg
constraint.
推荐答案
这是SQL只soulution似乎是有点问题在这里。随着一些程序的帮助在SQL之上编程解决方案似乎是failry简单高效。下面是一个简要概述的解决方案,可以使用任何过程语言调用SQL来实现。
An SQL only soulution appears to be a bit of a problem here. With the help of some proceduralprogramming on top of SQL the solution appears to be failry simple and efficient. Here is a brief outlineof a solution as could be implemented using any procedural language invoking SQL.
声明表研究
与主键 ID
,其中 ID
对应同一个域中 ID1
和 ID2
表中的 T1
。表研究
包含一个其他非键列,一个标签
号
Declare table R
with primary key ID
where ID
corresponds the same domain as ID1
and ID2
of table T1
.Table R
contains one other non-key column, a Label
number
填写表格研究
与 T1
中值的范围。设置标签
为零(无标签)。使用您的数据。例如,初始设置为研究
看起来像:
Populate table R
with the range of values found in T1
. Set Label
to zero (no label).Using your example data, the initial setup for R
would look like:
Table R
ID Label
== =====
1 0
2 0
4 0
5 0
7 0
8 0
9 0
使用宿主语言光标加上辅助计数器,读 T1
的每一行。查找 ID1
和 ID2
在研究
。你会发现之一四种情况:
Using a host language cursor plus an auxiliary counter, read each row from T1
. Lookup ID1
and ID2
in R
. You will find one offour cases:
Case 1: ID1.Label == 0 and ID2.Label == 0
在这种情况下,无论这些 ID
取值已经看到之前之一:加1到柜台,然后更新两个行研究
到计数器的值:更新R SET R.Label =:柜台里R.ID在(:ID1,:ID2)
In this case neither one of these ID
s have been "seen" before: Add 1 to the counter and then update bothrows of R
to the value of the counter: update R set R.Label = :counter where R.ID in (:ID1, :ID2)
Case 2: ID1.Label == 0 and ID2.Label <> 0
在这种情况下, ID1
是新的,但 ID2
已分配的标签。 ID1
需要被分配到相同的标签为 ID2
:更新R SET R.Lablel =:ID2.Label其中,R.ID =:ID1
In this case, ID1
is new but ID2
has already been assigned a label. ID1
needs to be assigned to thesame label as ID2
: update R set R.Lablel = :ID2.Label where R.ID = :ID1
Case 3: ID1.Label <> 0 and ID2.Label == 0
在这种情况下, ID2
是新的,但 ID1
已分配的标签。 ID2
需要被分配到相同的标签为 ID1
:更新R SET R.Lablel =:ID1.Label其中,R.ID =:ID2
In this case, ID2
is new but ID1
has already been assigned a label. ID2
needs to be assigned to thesame label as ID1
: update R set R.Lablel = :ID1.Label where R.ID = :ID2
Case 4: ID1.Label <> 0 and ID2.Label <> 0
在此情况下,该行包含冗余信息。两排研究
应包含相同的标签值。如果不,有某种形式的数据完整性问题。哈啊...不太看到编辑......
In this case, the row contains redundant information. Both rows of R
should contain the same Label value. If not,there is some sort of data integrity problem. Ahhhh... not quite see edit...
修改我才意识到,有其中两个标签
值这里可能是非零和不同的情况。如果两者都非零和不同,那么需要两个标签
集团在这一点上要合并。所有你需要做的是选择一个标签
和更新等相匹配的东西,如:更新R SET R.Label为ID1.Label其中R .Label = ID2.Label
。现在,这两个集团已合并到同一个标签
值。
EDIT I just realized that there are situations where both Label
values here could be non-zero and different. If both are non-zero and different then two Label
groups need to be merged at this point. All you need to do is choose one Label
and update the others to match with something like: update R set R.Label to ID1.Label where R.Label = ID2.Label
. Now both groups have been merged with the same Label
value.
当光标完成后,表研究
将包含需要更新 T2
标签值。
Upon completion of the cursor, table R
will contain Label values needed to update T2
.
Table R
ID Label
== =====
1 1
2 1
4 2
5 1
7 2
8 2
9 1
进程表 T2
使用线沿线的东西:设置T2.Label为R.Label,其中T2.ID1 = R.ID
。最终的结果应该是:
Process table T2
using something along the lines of: set T2.Label to R.Label where T2.ID1 = R.ID
. The end result should be:
table T2
ID1 | ID2 | LABEL
1 | 2 | 1
1 | 5 | 1
4 | 7 | 2
7 | 8 | 2
9 | 1 | 1
这个过程是puerly迭代的,应扩展到相当大的表没有困难。
This process is puerly iterative and should scale to fairly large tables without difficulty.
这篇关于如何标记一个大集“传递组”与约束?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!