如何使用pandas to_csv（）编写一个带多个标题行的csv文件？

本文介绍了如何使用pandas to_csv（）编写一个带多个标题行的csv文件？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

考虑一个以 date 列作为索引的数据框，以及三列 x ， y 和 z 我想将此数据帧的内容写入.csv文件。我知道我可以使用 df.to_csv 为此，但是，我想添加一个第二个标题行与单位。在此示例中，所需的.csv文件将如下所示：

Consider a data frame with a date column as an index and three columns x, y and z with some observations. I want to write the contents of this data frame to a .csv file. I know I can use df.to_csv for this, however, I would like to add a second header line with the units. In this example, the desired .csv file would look something like this:

date,x,y,z
(yyyy-mm-dd),(s),(m),(kg)
2014-03-12,1,2,3
2014-03-13,4,5,6
...

推荐答案

在你的例子中的确切输出，但它接近。您可以使用多索引列将第二个标题（单位）与列标签一起存储：

This doesn't produce the exact output in your example, but it's close. You can use multi-index columns to store the second header (the units) with the column labels:

>>> import pandas as pd
>>> columns = pd.MultiIndex.from_tuples(
...     zip(['date', 'x', 'y', 'z'],
...         ['(yyyy-mm-dd)', '(s)', '(m)', '(kg)']))
>>> data = [['2014-03-12', 1, 2, 3],
...         ['2014-03-13', 4, 5, 6]]
>>> df = pd.DataFrame(data, columns=columns)
>>> df
          date   x   y    z
  (yyyy-mm-dd) (s) (m) (kg)
0   2014-03-12   1   2    3
1   2014-03-13   4   5    6

以这种方式存储第二个标题允许您的列保持正确的类型 x 应为整数类型）：

Storing the second header this way allows your columns to keep the correct type (e.g., column x should be an integer type):

>>> df.dtypes
date  (yyyy-mm-dd)    object
x     (s)              int64
y     (m)              int64
z     (kg)             int64
dtype: object

如果您已将第二个标题作为行存储在 DataFrame ，你的列 dtypes 会变成 object ，你可能不想

If you had stored the second header as a row in the DataFrame, your column dtypes would become object, which you probably don't want.

以CSV格式书写 DataFrame 会产生与您的示例非常相似的内容：

Writing the DataFrame in CSV format produces something very similar to your example:

>>> df.to_csv('out.csv', index=False)
>>> !cat out.csv
date,x,y,z
(yyyy-mm-dd),(s),(m),(kg)
,,,
2014-03-12,1,2,3
2014-03-13,4,5,6

唯一的区别是额外的逗号行，这就是pandas如何将多行标题从实际的数据行中分离出来。这允许将CSV文件读回到等效的 DataFrame ：

The only difference is the extra line of commas, which is how pandas separates multi-row headers from the actual rows of data. This allows the CSV file to be read back into an equivalent DataFrame:

>>> df2 = pd.read_csv('out.csv', header=[0, 1])
>>> df2
          date   x   y    z
  (yyyy-mm-dd) (s) (m) (kg)
0   2014-03-12   1   2    3
1   2014-03-13   4   5    6

注意：我发现很多这些信息散布在。

Note: I found a lot of this information scattered throughout this SO question.

这篇关于如何使用pandas to_csv（）编写一个带多个标题行的csv文件？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！