python - 使用xlrd/xlwt和循环迭代优化Excel数据收集/精简

我最近才开始使用Python进行编码，并且有很多东西需要学习。我的代码的目标是从一个单元格中提取一个字符串，检查其字符长度，并用特定的缩写替换单词。然后，我将新字符串写入另一个Excel工作表，并在所有数据都减少之后保存。我终于想出了如何使其工作的方法，但是确实需要很长时间。我正在使用10,000多个字符串单元，而我的循环迭代可能还远未优化。如果您有任何有用的信息，那将是很好的。

import xlwt
import xlrd

book = xlrd.open_workbook() # opens excel file for data input
reduc = xlwt.Workbook()     # creates the workbook that the reduced data will be saved in

# Calls the sheets I will be working with
Data = book.sheet_by_index(3)
Table = book.sheet_by_index(5)
sheet1 = reduc.add_sheet("sheet 1")

# the initial loop pulls the string from excel

for x in xrange(30): # I use a limited range for debugging
    From = str(Data.col(15)[x].value)
    To = str(Data.col(16)[x].value)
    print x # I just print this to let me know that i'm not stuck

    if len(From) <= 30 and len(To) <= 30:
        sheet1.write(x, 3, From)
        sheet1.write(x, 4, To)
    else:
        while len(From) > 30 or len(To) > 30:
            for y in xrange(Table.nrows):
                word = str(Table.col(0)[y].value)
                abbrv = str(Table.col(1)[y].value)
                if len(From) > 30:
                    From = From.replace(word, abbrv)
                if len (To) > 30:
                    To = To.replace(word, abbrv)
            sheet1.write(x, 3, From)
            sheet1.write(x, 4, To)
            break

reduc.save("newdoc.xls")
print " DONE!

下面是我的更新代码。这几乎是即时的，这正是我所期望的。我预加载了我想要的所有列，然后通过相同的循环系统运行它。然后，我存储而不是将数据写入新的excel文件。减少所有数据后，我将每个单元格保存在单独的for循环中。谢谢你们的建议。

import xlwt
import xlrd

# Workbook must be located in the Python27 folder in the C:/directory
book = xlrd.open_workbook() # opens exel file for data input

# Calls the sheets I will be working with
Data = book.sheet_by_index(0)
Table = book.sheet_by_index(1)

# Import column data from excel
From = Data.col_values(15)
To = Data.col_values(16)
word = Table.col_values(0)
abbrv = Table.col_values(1)

# Empty variables to be filled with reduced string
From_r = []
To_r = []

# Notes to be added
for x in xrange(Data.nrows):
    if len(From[x]) <= 28 and len(To[x]) <= 28:
        From_r.append(From[x])
        To_r.append(To[x])
    else:
        while len(From[x]) > 28 or len(To[x]) > 28:
            for y in xrange(Table.nrows):
                if len(From[x]) > 28:
                    From[x] = From[x].replace(word[y], abbrv[y])
                if len (To[x]) > 28:
                    To[x] = To[x].replace(word[y], abbrv[y])
            From_r.append(From[x])
            To_r.append(To[x])
            break

# Create new excel file to write reduced strings into
reduc = xlwt.Workbook()
sheet1 = reduc.add_sheet("sheet 1")

# Itterate through list to write each object into excel
for i in xrange(Data.nrows):
    sheet1.write(i, 3, From_r[i])
    sheet1.write(i, 4, To_r[i])

# Save reduced string in new excel file
reduc.save("lucky.xls")
print " DONE! "

最佳答案

速度缓慢可能是因为替换代码效率低下。
您应该尝试加载所有单词和相应缩写的客栈，除非列表太大，否则将耗尽内存。
然后，为了提高速度，您可以一次性替换所有单词。

这样做并将其移出循环

words = [str(cell.value) for cell in Table.col(0)] #list comprehension
abbr = [str(cell.value) for cell in Table.col(1)]
replacements = zip(words, abbr)

here中的此函数使用正则表达式模块替换给定列表中的所有匹配项。

import re
def multiple_replacer(*key_values):
    replace_dict = dict(key_values)
    replacement_function = lambda match: replace_dict[match.group(0)]
    pattern = re.compile("|".join([re.escape(k) for k, v in key_values]))
    return lambda string: pattern.sub(replacement_function, string)

要使用它，请执行以下操作：

replaceFunc = multiple_replacer(*replacements) #constructs the function. Do this outside the loop, after the replacements have been gathered.
myString = replaceFunc(myString)