问题描述
我正在尝试将熊猫DataFrame
写到.xlsx
文件,其中不同的数字列将具有不同的格式.例如,有些将只显示两位小数,有些将不显示小数,有些将被格式化为带有%"符号的百分比,等等.
I am trying to write a pandas DataFrame
to an .xlsx
file where different numerical columns would have different formats. For example, some would show only two decimal places, some would show none, some would be formatted as percents with a "%" symbol, etc.
我注意到DataFrame.to_html()
有一个formatters
参数,该参数可以将不同的格式映射到不同的列.但是,DataFrame.to_excel()
方法上没有类似的参数.我们拥有的最多的是对所有数字都是全局的float_format
.
I noticed that DataFrame.to_html()
has a formatters
parameter that allows one to do just that, mapping different formats to different columns. However, there is no similar parameter on the DataFrame.to_excel()
method. The most we have is a float_format
that is global to all numbers.
我读了很多SO帖子,这些帖子至少部分与我的问题有关,例如:
I have read many SO posts that are at least partly related to my question, for example:
- 使用较旧的
openpyxl
引擎来应用一种格式一次插入一个单元格.这是我获得最大成功的方法.但这意味着编写循环以逐个单元地应用格式,记住偏移量等. - 通过将表数据本身更改为字符串来呈现百分比.走上改变实际数据的路线,这激发了我尝试在写入Excel之前在每列上调用
round()
来处理小数位格式的方法-也可以,但是我想避免更改数据. - 其他种类,主要是日期格式
- Use the older
openpyxl
engine to apply formats one cell at a time. This is the approach with which I've had the most success. But it means writing loops to apply formats cell-by-cell, remembering offsets, etc. - Render percentages by changing the table data itself into strings. Going the route of altering the actual data inspired me to try dealing with decimal place formatting by calling
round()
on each column before writing to Excel - this works too, but I'd like to avoid altering the data. - Assorted others, mostly about date formats
pandas API中是否还有其他更方便的与Excel相关的函数/属性可以在这里提供帮助,或者在openpyxl
上提供类似的功能,或者可以通过某种方式将输出格式元数据直接指定到DataFrame
中的每一列上然后将由不同的输出者在下游解释?
Are there other more convenient Excel-related functions/properties in the pandas API that can help here, or something similar on openpyxl
, or perhaps some way to specify output format metadata directly onto each column in the DataFrame
that would then be interpreted downstream by different outputters?
推荐答案
您可以使用Pandas 0.16和XlsxWriter引擎通过访问基础工作簿和工作表对象来做到这一点:
You can do this with Pandas 0.16 and the XlsxWriter engine by accessing the underlying workbook and worksheet objects:
import pandas as pd
# Create a Pandas dataframe from some data.
df = pd.DataFrame(zip(
[1010, 2020, 3030, 2020, 1515, 3030, 4545],
[.1, .2, .33, .25, .5, .75, .45],
[.1, .2, .33, .25, .5, .75, .45],
))
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter objects from the dataframe writer object.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Add some cell formats.
format1 = workbook.add_format({'num_format': '#,##0.00'})
format2 = workbook.add_format({'num_format': '0%'})
format3 = workbook.add_format({'num_format': 'h:mm:ss AM/PM'})
# Set the column width and format.
worksheet.set_column('B:B', 18, format1)
# Set the format but not the column width.
worksheet.set_column('C:C', None, format2)
worksheet.set_column('D:D', 16, format3)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
输出:
另请参见使用Python熊猫和XlsxWriter .
这篇关于将pandas DataFrame以不同的格式写到Excel中以用于不同的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!