如果下面的每个字母都代表一个名字根据祖先的共同程度来分类的最好方法是什么?

A B C D
E F G H
I J K L
M N C D
O P C D
Q R C D
S T G H
U V G H
W J K L
X J K L

结果应该是:
I J K L # Three names is more important that two names
W J K L
X J K L
A B C D # C D is repeated more than G H
M N C D
O P C D
Q R C D
E F G H
S T G H
U V G H

编辑:
名字中可能有空格(Double names)。
考虑下面的例子,其中每个字母代表一个单词:
A B C D M
E F G H M
I J K L M
M N C D M
O P C D
Q R C D
S T G H
U V G H
W J K L
X J K L

输出应为:
A B C D M
M N C D M
I J K L M
E F G H M
W J K L
X J K L
O P C D
Q R C D
S T G H
U V G H

最佳答案

首先计算每个链的出现次数。然后根据计数对每个名字进行排序。试试这个:

from collections import defaultdict

words = """A B C D
E F G H
I J K L
M N C D
O P C D
Q R C D
S T G H
U V G H
W J K L
X J K L"""

words = words.split('\n')

# Count ancestors
counters = defaultdict(lambda: defaultdict(lambda: 0))
for word in words:
    parts = word.split()
    while parts:
        counters[len(parts)][tuple(parts)] += 1
        parts.pop(0)

# Calculate tuple of ranks, used for sorting
ranks = {}
for word in words:
    rank = []
    parts = word.split()
    while parts:
        rank.append(counters[len(parts)][tuple(parts)])
        parts.pop(0)
    ranks[word] = tuple(rank)

# Sort by ancestor count, longest chain comes first
words.sort(key=lambda word: ranks[word], reverse=True)
print(words)

10-08 04:17