我最近才开始使用Python进行编码,并且有很多东西需要学习。我的代码的目标是从一个单元格中提取一个字符串,检查其字符长度,并用特定的缩写替换单词。然后,我将新字符串写入另一个Excel工作表,并在所有数据都减少之后保存。我终于想出了如何使其工作的方法,但是确实需要很长时间。我正在使用10,000多个字符串单元,而我的循环迭代可能还远未优化。如果您有任何有用的信息,那将是很好的。
import xlwt
import xlrd
book = xlrd.open_workbook() # opens excel file for data input
reduc = xlwt.Workbook() # creates the workbook that the reduced data will be saved in
# Calls the sheets I will be working with
Data = book.sheet_by_index(3)
Table = book.sheet_by_index(5)
sheet1 = reduc.add_sheet("sheet 1")
# the initial loop pulls the string from excel
for x in xrange(30): # I use a limited range for debugging
From = str(Data.col(15)[x].value)
To = str(Data.col(16)[x].value)
print x # I just print this to let me know that i'm not stuck
if len(From) <= 30 and len(To) <= 30:
sheet1.write(x, 3, From)
sheet1.write(x, 4, To)
else:
while len(From) > 30 or len(To) > 30:
for y in xrange(Table.nrows):
word = str(Table.col(0)[y].value)
abbrv = str(Table.col(1)[y].value)
if len(From) > 30:
From = From.replace(word, abbrv)
if len (To) > 30:
To = To.replace(word, abbrv)
sheet1.write(x, 3, From)
sheet1.write(x, 4, To)
break
reduc.save("newdoc.xls")
print " DONE!
下面是我的更新代码。这几乎是即时的,这正是我所期望的。我预加载了我想要的所有列,然后通过相同的循环系统运行它。然后,我存储而不是将数据写入新的excel文件。减少所有数据后,我将每个单元格保存在单独的for循环中。谢谢你们的建议。
import xlwt
import xlrd
# Workbook must be located in the Python27 folder in the C:/directory
book = xlrd.open_workbook() # opens exel file for data input
# Calls the sheets I will be working with
Data = book.sheet_by_index(0)
Table = book.sheet_by_index(1)
# Import column data from excel
From = Data.col_values(15)
To = Data.col_values(16)
word = Table.col_values(0)
abbrv = Table.col_values(1)
# Empty variables to be filled with reduced string
From_r = []
To_r = []
# Notes to be added
for x in xrange(Data.nrows):
if len(From[x]) <= 28 and len(To[x]) <= 28:
From_r.append(From[x])
To_r.append(To[x])
else:
while len(From[x]) > 28 or len(To[x]) > 28:
for y in xrange(Table.nrows):
if len(From[x]) > 28:
From[x] = From[x].replace(word[y], abbrv[y])
if len (To[x]) > 28:
To[x] = To[x].replace(word[y], abbrv[y])
From_r.append(From[x])
To_r.append(To[x])
break
# Create new excel file to write reduced strings into
reduc = xlwt.Workbook()
sheet1 = reduc.add_sheet("sheet 1")
# Itterate through list to write each object into excel
for i in xrange(Data.nrows):
sheet1.write(i, 3, From_r[i])
sheet1.write(i, 4, To_r[i])
# Save reduced string in new excel file
reduc.save("lucky.xls")
print " DONE! "
最佳答案
速度缓慢可能是因为替换代码效率低下。
您应该尝试加载所有单词和相应缩写的客栈,除非列表太大,否则将耗尽内存。
然后,为了提高速度,您可以一次性替换所有单词。
这样做并将其移出循环
words = [str(cell.value) for cell in Table.col(0)] #list comprehension
abbr = [str(cell.value) for cell in Table.col(1)]
replacements = zip(words, abbr)
here中的此函数使用正则表达式模块替换给定列表中的所有匹配项。
import re
def multiple_replacer(*key_values):
replace_dict = dict(key_values)
replacement_function = lambda match: replace_dict[match.group(0)]
pattern = re.compile("|".join([re.escape(k) for k, v in key_values]))
return lambda string: pattern.sub(replacement_function, string)
要使用它,请执行以下操作:
replaceFunc = multiple_replacer(*replacements) #constructs the function. Do this outside the loop, after the replacements have been gathered.
myString = replaceFunc(myString)