问题描述
我正在编写一个程序,该程序需要9个字符,创建所有可能的排列,并为每个字符获取字典文件,然后创建一组所有可能的单词.我需要做的是将所有排列与单词进行比较,然后返回匹配项.
I'm writing a program that takes 9 characters, creates all possible permutations, and grabs a dictionary files for each character and then creates a set of all possible words. What I need to do is compare all permutations to words and return matches.
import os, itertools
def parsed(choices):
mySet = set()
location = os.getcwd()
for item in choices:
filename = location + "\\dicts\\%s.txt" % (item)
mySet.update(open(filename).read().splitlines())
return mySet
def permutations(input):
possibilities = []
pospos = []
for x in range(3,9):
pospos.append([''.join(i) for i in itertools.permutations(input, x)])
for pos in pospos:
for i in pos:
possibilities.append(i)
return possibilities
有问题的功能是这个:
def return_matches():
matches = []
words = parsed(['s','m','o','k','e', 'j', 'a', 'c', 'k'])
pos = permutations(['s','m','o','k','e', 'j', 'a', 'c', 'k'])
for item in pos:
if item in words:
matches.append(item)
return matches
此代码应返回:
matches = ['a', 'om', 'ja', 'jo', ..., 'jacks', 'cokes', 'kecks', 'jokes', 'cakes', 'smoke', 'comes', 'makes', 'cameos']
如果我使此代码正常工作,则需要10到15分钟才能完成.另一方面,每次尝试在指定的时间内执行此操作,只能用5个或更少的字符来完成,否则返回错误的结果.
If I get this code to work properly, it takes 10 - 15 minutes to complete. On the other hand, every attempt at making this execute within allotted time, it can only be done with 5 or less characters or returns the wrong result.
所以我的问题是如何优化此代码以在30秒内返回正确的结果.
So my question is how to optimize this code to return the right result, within 30 seconds time.
修改 http://www.mso.anu.edu.au/~ralph/OPTED/v003 这是我要从中抓取词典文件的网站.
Edithttp://www.mso.anu.edu.au/~ralph/OPTED/v003 this is the website I'm scraping the dictionary files from.
推荐答案
在测试所有排列是否有效之前,浪费了RAM和时间将所有排列存储在列表中.相反,请在生成排列时对其进行测试,然后将有效的排列保存到集合中以消除重复.
It wastes RAM and time storing all the permutations in a list before you test if they're valid. Instead, test the permutations as you generate them, and save the valid ones into a set to eliminate duplicates.
由于 itertools.permutations
作品:
Duplicates are possible because of the way itertools.permutations
works:
您输入的单词"SMOKEJACK"包含2个K,因此每个包含K的排列都会生成两次.
Your input word "SMOKEJACK" contains 2 Ks, so every permutation containing K gets generated twice.
无论如何,这里有一些代码使用 SOWPODS 拼写英语单词列表.
Anyway, here's some code that uses the SOWPODS Scrabble word list for English.
from itertools import permutations
# Get all the words from the SOWPODS file
all_words = set('AI')
fname = 'scrabble_wordlist_sowpods.txt'
with open(fname) as f:
all_words.update(f.read().splitlines())
print(len(all_words))
choices = 'SMOKEJACK'
# Generate all permutations of `choices` from length 3 to 8
# and save them in a set to eliminate duplicates.
matches = set()
for n in range(3, 9):
for t in permutations(choices, n):
s = ''.join(t)
if s in all_words:
matches.add(s)
for i, s in enumerate(sorted(matches)):
print('{:3} {}'.format(i, s))
输出
216555
0 ACE
1 ACES
2 ACME
3 ACMES
4 AESC
5 AKE
6 AKES
7 AMOK
8 AMOKS
9 ASK
10 CAKE
11 CAKES
12 CAM
13 CAME
14 CAMEO
15 CAMEOS
16 CAMES
17 CAMS
18 CASE
19 CASK
20 CEAS
21 COKE
22 COKES
23 COMA
24 COMAE
25 COMAKE
26 COMAKES
27 COMAS
28 COME
29 COMES
30 COMS
31 COS
32 COSE
33 COSMEA
34 EAS
35 EKKA
36 EKKAS
37 EMS
38 JACK
39 JACKS
40 JAK
41 JAKE
42 JAKES
43 JAKS
44 JAM
45 JAMES
46 JAMS
47 JOCK
48 JOCKS
49 JOE
50 JOES
51 JOKE
52 JOKES
53 KAE
54 KAES
55 KAM
56 KAME
57 KAMES
58 KAS
59 KEA
60 KEAS
61 KECK
62 KECKS
63 KEKS
64 KOA
65 KOAS
66 KOS
67 MAC
68 MACE
69 MACES
70 MACK
71 MACKS
72 MACS
73 MAE
74 MAES
75 MAK
76 MAKE
77 MAKES
78 MAKO
79 MAKOS
80 MAKS
81 MAS
82 MASE
83 MASK
84 MES
85 MESA
86 MOA
87 MOAS
88 MOC
89 MOCK
90 MOCKS
91 MOCS
92 MOE
93 MOES
94 MOKE
95 MOKES
96 MOS
97 MOSE
98 MOSK
99 OAK
100 OAKS
101 OCA
102 OCAS
103 OES
104 OKA
105 OKAS
106 OKE
107 OKES
108 OMS
109 OSE
110 SAC
111 SACK
112 SAE
113 SAKE
114 SAM
115 SAME
116 SAMEK
117 SCAM
118 SEA
119 SEAM
120 SEC
121 SECO
122 SKA
123 SKEO
124 SMA
125 SMACK
126 SMOCK
127 SMOKE
128 SOAK
129 SOC
130 SOCA
131 SOCK
132 SOJA
133 SOKE
134 SOMA
135 SOME
此代码在我相当古老的32位2GHz机器上(在Linux上运行Python 3.6.0)运行约2.5秒.在Python 2上,速度稍快(因为Python2字符串是ASCII,而不是Unicode).
This code runs in around 2.5 seconds on my rather ancient 32 bit 2GHz machine running Python 3.6.0 on Linux. It's slightly faster on Python 2 (since Python2 strings are ASCII, not Unicode).
这篇关于查找分配时间内的排列的所有匹配项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!