问题描述
我使用的是 Python 3.4 和 xlrd.我想在处理之前根据主列对 Excel 工作表进行排序.是否有任何库可以执行此操作?
有几种方法可以做到这一点.第一个选项是使用 的一些修改:
import xlwt从 xlrd 导入 open_workbooktarget_column = 0 # 这个例子只有1列,并且是0索引book = open_workbook('test.xlsx')sheet = book.sheets()[0]数据 = [sheet.row_values(i) for i in xrange(sheet.nrows)]label = data[0] # 不要对我们的标题进行排序data = data[1:] # 数据从第二行开始data.sort(key=lambda x: x[target_column])bk = xlwt.Workbook()sheet = bk.add_sheet(sheet.name)对于 idx,enumerate(labels) 中的标签:sheet.write(0, idx, 标签)对于 idx_r,enumerate(data) 中的行:对于 idx_c,枚举(行)中的值:sheet.write(idx_r+1, idx_c, value)bk.save('result.xls') # 注意这是 xls,而不是像原始文件那样的 xlsx
这将输出以下工作簿:
另一种选择(并且可以利用 XLSX 输出)是利用 调用,将 index
设置为 False
,这样 Pandas 数据框索引就不会包含在 Excel 文档中.其余的关键字应该是不言自明的.
I am using Python 3.4 and xlrd. I want to sort the Excel sheet based on the primary column before processing it. Is there any library to perform this ?
There are a couple ways to do this. The first option is to utilize xlrd
, as you have this tagged. The biggest downside to this is that it doesn't natively write to XLSX format.
These examples use an excel document with this format:
Utilizing xlrd
and a few modifications from this answer:
import xlwt
from xlrd import open_workbook
target_column = 0 # This example only has 1 column, and it is 0 indexed
book = open_workbook('test.xlsx')
sheet = book.sheets()[0]
data = [sheet.row_values(i) for i in xrange(sheet.nrows)]
labels = data[0] # Don't sort our headers
data = data[1:] # Data begins on the second row
data.sort(key=lambda x: x[target_column])
bk = xlwt.Workbook()
sheet = bk.add_sheet(sheet.name)
for idx, label in enumerate(labels):
sheet.write(0, idx, label)
for idx_r, row in enumerate(data):
for idx_c, value in enumerate(row):
sheet.write(idx_r+1, idx_c, value)
bk.save('result.xls') # Notice this is xls, not xlsx like the original file is
This outputs the following workbook:
Another option (and one that can utilize XLSX output) is to utilize pandas
. The code is also shorter:
import pandas as pd
xl = pd.ExcelFile("test.xlsx")
df = xl.parse("Sheet1")
df = df.sort(columns="Header Row")
writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer,sheet_name='Sheet1',columns=["Header Row"],index=False)
writer.save()
This outputs:
In the to_excel
call, the index
is set to False
, so that the Pandas dataframe index isn't included in the excel document. The rest of the keywords should be self explanatory.
这篇关于如何使用 Python 对 Excel 工作表进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!