问题描述
以下仅是数据科学课程作业的开始。我希望这不是微不足道的。但我迷失了,无法找到答案。
我被要求将Excelfile导入熊猫数据框并随后对其进行操作。该文件可在此处找到:
The following is only the beginning for an Coursera assignment on Data Science. I hope this is not to trivial for. But I am lost on this and could not find an answer.I am asked to import an Excelfile into a panda dataframe and to manipulate it afterwards. The file can be found here: http://unstats.un.org/unsd/environment/excel_file_tables/2013/Energy%20Indicators.xls
让我感到困难的是
a)有17行的开销和一个页脚
b)前两列是空的
c)索引列没有标题名称
a) there is an 'overhead' of 17 lines and a footerb) the first two columns are emptyc) the index column has no header name
如果在搜索和阅读后几个小时我想出了这个无用的行:
After hours if seraching and reading I came up with this useless line:
energy=pd.read_excel('Energy Indicators.xls',
sheetname='Energy',
header=16,
skiprows=[17],
skipfooter=38,
skipcolumns=2
)
这似乎产生了一个多索引数据帧。虽然命令energy.head()没有返回任何内容。
This seems to produce a multindex dataframe. Though the command energy.head() returns nothing.
我有两个问题:
- 我错了什么。到本练习,我认为我理解数据框架。但是现在我完全无能为力和迷失了: - ((
- 我该如何解决这个问题?我需要做什么才能将这个Exceldata变成一个数据文件,其索引由国家?
谢谢。
推荐答案
我认为你需要添加参数:
I think you need add parameters:
-
index_col
用于将列转换为索引 -
usecols
- 按位置解析列 - 将标题位置更改为
15
index_col
for convert column to indexusecols
- parse columns by positions- change header position to
15
energy=pd.read_excel('Energy Indicators.xls',
sheetname='Energy',
skiprows=[17],
skipfooter=38,
header=15,
index_col=[0],
usecols=[2,3,4,5]
)
print (energy.head())
Energy Supply Energy Supply per capita \
Afghanistan 321 10
Albania 102 35
Algeria 1959 51
American Samoa ... ...
Andorra 9 121
Renewable Electricity Production
Afghanistan 78.669280
Albania 100.000000
Algeria 0.551010
American Samoa 0.641026
Andorra 88.695650
这篇关于将Excel导入Panda Dataframe的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!