本文介绍了读取按nan行拆分的数据框,并在Python中将其重塑为多个数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个示例Excel文件 data1.xlsx
来自
现在我想使用 openpyxl
或 pandas
,然后将它们转换为新的 df1
和 df2
最终将它们另存为价格
和数量
表:
价格表:
和数量表
代码我用过:
df = pd.read_excel('./ data1.xlsx',sheet_name ='Sheet1')
df_list = np.split(df,df [df.isnull()。all(1)]。index)
for df_list中的df:
print(df,'\n')
出局:
bj未命名:1未命名:2未命名:3未命名:4
0年2018.0 2019.0 2020.0总和
1价格12.0 4.0 5.0 21
2数量5.0 5.0 3.0 13
bj未命名: 1未命名:2未命名:3未命名:4
3 NaN NaN NaN NaN NaN
4 sh NaN NaN NaN NaN NaN
5年2018.0 2019.0 2020.0总和
6价格5.0 6.0 7.0 18
7数量7.0 5.0 4.0 16
bj未命名:1未命名:2未命名:3未命名:4
8 NaN NaN NaN NaN NaN
bj未命名:1未命名:2未命名: 3未命名:4
9 NaN NaN NaN NaN NaN
10 gz NaN NaN NaN NaN
11年2018.0 2019.0 2020.0总和
12价格2.0 3.0 1.0 6
13数量6.0 9.0 3.0 18
bj未命名:1未命名:2未命名:3未命名:4
14 NaN NaN NaN NaN NaN
bbj未命名:1未命名:2未命名:3未命名:4
15 NaN NaN NaN NaN NaN NaN
16 sz NaN NaN NaN NaN
17年2018.0 2019.0 2020.0总和
18价格8.0 2.0 3.0 13
19数量5.0 4.0 3.0 12
如何在Python中做到这一点?非常感谢。
解决方案
使用:
#add header =默认列名称
df = pd.read_excel('./ data1.xlsx',sheet_name ='Sheet1',header = None)
#转换列第二行
df.columns = df.iloc [1] .rename(None)
#通过第二列
df向前填充非缺失值来创建新列`city` .insert(0,'city',df.iloc [:, 0] .mask(df.iloc [:, 1] .notna())。ffill())
#将浮点数转换为整数
df.columns = [如果isinstance(x,float)则为int(x),否则df.columns中的x为x]
#将列年转换为索引
df = df.set_index('year')
打印(df)
城市2018 2019 2020总和
年
bj bj NaN NaN NaN NaN NaN
年bj 2018.0 2019.0 2020.0总和
价格bj 12.0 4.0 5.0 21
数量bj 5.0 5.0 3.0 13
NaN bj NaN NaN NaN NaN
sh NaN NaN NaN NaN
年sh 2018.0 2019.0 2020.0 sum
价格sh 5.0 6.0 7.0 18
数量sh 7.0 5.0 4.0 16 16
NaN sh NaN NaN NaN NaN
NaN sh NaN NaN NaN NaN
gz gz NaN NaN NaN NaN
年gz 2018.0 2019.0 2020.0总和
价格gz 2.0 3.0 1.0 6
数量gz 6.0 9.0 3.0 18
NaN gz NaN NaN NaN NaN
NaN gz NaN NaN NaN NaN
sz sNaN NaN NaN NaN
年sz 2018.0 2019.0 2020.0总和
价格sz 8.0 2.0 3.0 13
数量sz 5.0 4.0 3.0 12
df1 = df.loc ['price']。reset_index(drop = True)
打印(df1)
city 2018 2019 2020 sum
0 bj 12.0 4.0 5.0 21
1 sh 5.0 6.0 7.0 18
2 gz 2.0 3.0 1.0 6
3 sz 8.0 2.0 3.0 13
df2 = df.loc ['quantity']。reset_index(drop = True)
打印(df2)
城市2018 2019 2020总和
0 bj 5.0 5.0 3.0 13
1 sh 7.0 5.0 4.0 16
2 gz 6.0 9.0 3.0 18
3 sz 5.0 4.0 3.0 12
最后将 DataFrame
s写入现有文件是可以通过 mode ='a'
参数:
与pd.ExcelWriter('data1.xlsx',模式='a')作为作者:
df1.to_excel(writer,sheet_name ='price')
df2.to_excel(writer,sheet_name ='quantity')
I have a example excel file data1.xlsx
from here, which has a Sheet1
as follows:
Now I want to read it with openpyxl
or pandas
, then convert them into new df1
and df2
, I will finally save them as price
and quantity
sheet:
price sheet:
and quantity sheet
Code I have used:
df = pd.read_excel('./data1.xlsx', sheet_name = 'Sheet1')
df_list = np.split(df, df[df.isnull().all(1)].index)
for df in df_list:
print(df, '\n')
Out:
bj Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4
0 year 2018.0 2019.0 2020.0 sum
1 price 12.0 4.0 5.0 21
2 quantity 5.0 5.0 3.0 13
bj Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4
3 NaN NaN NaN NaN NaN
4 sh NaN NaN NaN NaN
5 year 2018.0 2019.0 2020.0 sum
6 price 5.0 6.0 7.0 18
7 quantity 7.0 5.0 4.0 16
bj Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4
8 NaN NaN NaN NaN NaN
bj Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4
9 NaN NaN NaN NaN NaN
10 gz NaN NaN NaN NaN
11 year 2018.0 2019.0 2020.0 sum
12 price 2.0 3.0 1.0 6
13 quantity 6.0 9.0 3.0 18
bj Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4
14 NaN NaN NaN NaN NaN
bj Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4
15 NaN NaN NaN NaN NaN
16 sz NaN NaN NaN NaN
17 year 2018.0 2019.0 2020.0 sum
18 price 8.0 2.0 3.0 13
19 quantity 5.0 4.0 3.0 12
How could I do that in Python? Thanks a lot.
解决方案
Use:
#add header=None for default columns names
df = pd.read_excel('./data1.xlsx', sheet_name = 'Sheet1', header=None)
#convert columns by second row
df.columns = df.iloc[1].rename(None)
#create new column `city` by forward filling non missing values by second column
df.insert(0, 'city', df.iloc[:, 0].mask(df.iloc[:, 1].notna()).ffill())
#convert floats to integers
df.columns = [int(x) if isinstance(x, float) else x for x in df.columns]
#convert column year to index
df = df.set_index('year')
print (df)
city 2018 2019 2020 sum
year
bj bj NaN NaN NaN NaN
year bj 2018.0 2019.0 2020.0 sum
price bj 12.0 4.0 5.0 21
quantity bj 5.0 5.0 3.0 13
NaN bj NaN NaN NaN NaN
sh sh NaN NaN NaN NaN
year sh 2018.0 2019.0 2020.0 sum
price sh 5.0 6.0 7.0 18
quantity sh 7.0 5.0 4.0 16
NaN sh NaN NaN NaN NaN
NaN sh NaN NaN NaN NaN
gz gz NaN NaN NaN NaN
year gz 2018.0 2019.0 2020.0 sum
price gz 2.0 3.0 1.0 6
quantity gz 6.0 9.0 3.0 18
NaN gz NaN NaN NaN NaN
NaN gz NaN NaN NaN NaN
sz sz NaN NaN NaN NaN
year sz 2018.0 2019.0 2020.0 sum
price sz 8.0 2.0 3.0 13
quantity sz 5.0 4.0 3.0 12
df1 = df.loc['price'].reset_index(drop=True)
print (df1)
city 2018 2019 2020 sum
0 bj 12.0 4.0 5.0 21
1 sh 5.0 6.0 7.0 18
2 gz 2.0 3.0 1.0 6
3 sz 8.0 2.0 3.0 13
df2 = df.loc['quantity'].reset_index(drop=True)
print (df2)
city 2018 2019 2020 sum
0 bj 5.0 5.0 3.0 13
1 sh 7.0 5.0 4.0 16
2 gz 6.0 9.0 3.0 18
3 sz 5.0 4.0 3.0 12
Last write DataFrame
s to existing file is possible by mode='a'
parameter, link:
with pd.ExcelWriter('data1.xlsx', mode='a') as writer:
df1.to_excel(writer, sheet_name='price')
df2.to_excel(writer, sheet_name='quantity')
这篇关于读取按nan行拆分的数据框,并在Python中将其重塑为多个数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!