问题描述
我在填充数据框时遇到问题.这是初始情况(图片1)
我的代码是这样运行的(图2):
但是我想要这个(图片3):
因此,如果从-1到4的行为空,则该行应该为空.但是,如果有数字,则应使用"0"填充
我的代码看起来像这样...
import pandas as pd
df = pd.read_csv('/Users/Hanna/Code/ZERO.csv')
indx = df[df['D'] == -1].index.values
for i, j in zip(indx[:-1], indx[1:]):
df.loc[i:j-1, 'E'] = df.loc[i:j-1, 'E'].fillna(0)
if j == indx[-1]:
df.loc[j:, 'E'] = df.loc[j:, 'E'].fillna(0)
那是我的代码,但是我不确定'NaN'
d = {'A':[4000074,4000074,4000074,4000074,4000074,4000074,4000074,4000074,4000074,4000074,4000074,4000074,4000074,4000074,4000074,4000074,4000074,4000074,4000B] ':[SP000796746,SP000796746,SP000796746,SP000796746,SP000796746,SP000796746,SP000796746,SP000796746,SP000796746,SP000796746,SP000796746,SP000796746,SP000796746,SP000796746,SP000796746,SP000796746,SP000796746,SP000796746-1, 1,2,3,4,-1,0,1,2,3,4,-1,0,1,2,2,3,4,-1,0,1,2,3,4],'D ':[0,1000,1000,0,0,0,'NaN','NaN','NaN','NaN','NaN','NaN',0,0,0,3000,3000,0 ],'E':[2000,2000,2000,2000,2000,2000,'NaN','NaN','NaN','NaN','NaN','NaN',4000,4000,4000,4000 ,4000,4000]}谢谢你汉娜
也许它不起作用,因为我之前在另一列F上做了此操作:
indx = df[df['Diff Load Due Week'] == -1].index.values
for i, j in zip(indx[:-1], indx[1:]):
df.loc[i:j-1, 'F'] = df.loc[i:j-1, 'F'].max()
if j == indx[-1]:
df.loc[j:, 'F'] = df.loc[j:, 'F'].max()
是不是我必须先删除索引?
那是我的最后输出:
base_list =[-1,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26]
df_c = pd.MultiIndex.from_product([
[4000074],
["SP000796746","SP001811642"],
[201824, 201828, 201832, 201835, 201837, 201839, 201845, 201850, 201910, 201918, 201922, 201926, 201909, 201916, 201918, 201920],
base_list],
names=["A", "B", "C", "D"]).to_frame(index=False)
# Verbinden der neuen Liste und der kleinen Rohdatenliste
df_3 = pd.merge(df_c, df_1, how='outer')
# Zusammengefügte Daten in Excel und csv speichern für Überprüfung und Weiterarbeit
df_3.to_csv('GROß.csv')
df_3.to_excel('GROß.xlsx')
Einlesen der neustellten csv
df = pd.read_csv('/Users/Hanna/Desktop/Daten Projektseminar/Coding/GROß.csv')
#Index setzen für -1, damit Spalten und Reihen aufgefüllt werden können
indx = df[df['D'] == -1].index.values
#Aufüllen der Billings mit maximalen Wert
for i, j in zip(indx[:-1], indx[1:]):
df.loc[i:j-1, 'F'] = df.loc[i:j-1, 'F'].max()
if j == indx[-1]:
df.loc[j:, 'F'] = df.loc[j:, 'F'].max()
好的,现在就可以解决问题:
import pandas as pd
import numpy as np
#your data
d = {'A': [4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074,
4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074],
'B': ['SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746',
'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746',
'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746'],
'C': [201926, 201926, 201926, 201926, 201926, 201926, 201909,201909, 201909, 201909, 201909,
201909, 201933, 201933, 201933, 201933, 201933, 201933],
'D': [-1, 0, 1, 2, 3, 4, -1, 0, 1, 2, 3, 4, -1, 0, 1, 2, 3, 4],
'E': [np.nan, 1000, 1000, 0, 0, 0, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 3000, 3000, np.nan]}
#create data frame
df = pd.DataFrame(data = d)
#group sum column E by ID columns
groupedSum = df.groupby(['A', 'B', 'C'])['E'].sum().reset_index()
#loop over unique IDs
for i, row in groupedSum.iterrows():
#define values
idValue = groupedSum.at[i,'C']
sumValue = groupedSum.at[i,'E']
#if sum is not zero
if (sumValue != 0):
#change values to zero if greater than zero
df['E'].loc[df['C'] == idValue] = df['E'].apply(lambda x: x if x > 0 else 0)
print(df)
A B C D E
0 4000074 SP000796746 201926 -1 0.0
1 4000074 SP000796746 201926 0 1000.0
2 4000074 SP000796746 201926 1 1000.0
3 4000074 SP000796746 201926 2 0.0
4 4000074 SP000796746 201926 3 0.0
5 4000074 SP000796746 201926 4 0.0
6 4000074 SP000796746 201909 -1 NaN
7 4000074 SP000796746 201909 0 NaN
8 4000074 SP000796746 201909 1 NaN
9 4000074 SP000796746 201909 2 NaN
10 4000074 SP000796746 201909 3 NaN
11 4000074 SP000796746 201909 4 NaN
12 4000074 SP000796746 201933 -1 0.0
13 4000074 SP000796746 201933 0 0.0
14 4000074 SP000796746 201933 1 0.0
15 4000074 SP000796746 201933 2 3000.0
16 4000074 SP000796746 201933 3 3000.0
17 4000074 SP000796746 201933 4 0.0
I have a problem with the filling of a Dataframe.This is the initial situation (picture1)
My code runs like this (picture 2):
But I want this (picture3):
So if there a row from -1 to 4 empty, it should be empty. But if there are numbers it should be filled with "0"
My code looks like this...
import pandas as pd
df = pd.read_csv('/Users/Hanna/Code/ZERO.csv')
indx = df[df['D'] == -1].index.values
for i, j in zip(indx[:-1], indx[1:]):
df.loc[i:j-1, 'E'] = df.loc[i:j-1, 'E'].fillna(0)
if j == indx[-1]:
df.loc[j:, 'E'] = df.loc[j:, 'E'].fillna(0)
That my code, but I am not sure with the 'NaN'
d = {'A': [4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074], 'B': [SP000796746, SP000796746, SP000796746, SP000796746, SP000796746, SP000796746, SP000796746, SP000796746, SP000796746, SP000796746, SP000796746, SP000796746, SP000796746, SP000796746, SP000796746, SP000796746, SP000796746, SP000796746], 'C': [-1, 0, 1, 2, 3, 4, -1, 0, 1, 2, 3, 4, -1, 0, 1, 2, 3, 4, -1, 0, 1, 2, 3, 4], 'D': [0, 1000, 1000, 0, 0, 0, 'NaN', 'NaN', 'NaN', 'NaN', 'NaN', 'NaN', 0, 0, 0, 3000, 3000, 0], 'E': [2000, 2000, 2000, 2000, 2000, 2000, 'NaN', 'NaN', 'NaN', 'NaN', 'NaN', 'NaN', 4000, 4000, 4000, 4000, 4000, 4000]}
Thank you Hanna
Maybe it does not work, because I do this before, with another column F:
indx = df[df['Diff Load Due Week'] == -1].index.values
for i, j in zip(indx[:-1], indx[1:]):
df.loc[i:j-1, 'F'] = df.loc[i:j-1, 'F'].max()
if j == indx[-1]:
df.loc[j:, 'F'] = df.loc[j:, 'F'].max()
Can it be, that I have to remove the index first?
That is my last output:
base_list =[-1,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26]
df_c = pd.MultiIndex.from_product([
[4000074],
["SP000796746","SP001811642"],
[201824, 201828, 201832, 201835, 201837, 201839, 201845, 201850, 201910, 201918, 201922, 201926, 201909, 201916, 201918, 201920],
base_list],
names=["A", "B", "C", "D"]).to_frame(index=False)
# Verbinden der neuen Liste und der kleinen Rohdatenliste
df_3 = pd.merge(df_c, df_1, how='outer')
# Zusammengefügte Daten in Excel und csv speichern für Überprüfung und Weiterarbeit
df_3.to_csv('GROß.csv')
df_3.to_excel('GROß.xlsx')
Einlesen der neu erstellten csv
df = pd.read_csv('/Users/Hanna/Desktop/Daten Projektseminar/Coding/GROß.csv')
#Index setzen für -1, damit Spalten und Reihen aufgefüllt werden können
indx = df[df['D'] == -1].index.values
#Aufüllen der Billings mit maximalen Wert
for i, j in zip(indx[:-1], indx[1:]):
df.loc[i:j-1, 'F'] = df.loc[i:j-1, 'F'].max()
if j == indx[-1]:
df.loc[j:, 'F'] = df.loc[j:, 'F'].max()
Ok this should do the trick now:
import pandas as pd
import numpy as np
#your data
d = {'A': [4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074,
4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074],
'B': ['SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746',
'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746',
'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746'],
'C': [201926, 201926, 201926, 201926, 201926, 201926, 201909,201909, 201909, 201909, 201909,
201909, 201933, 201933, 201933, 201933, 201933, 201933],
'D': [-1, 0, 1, 2, 3, 4, -1, 0, 1, 2, 3, 4, -1, 0, 1, 2, 3, 4],
'E': [np.nan, 1000, 1000, 0, 0, 0, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 3000, 3000, np.nan]}
#create data frame
df = pd.DataFrame(data = d)
#group sum column E by ID columns
groupedSum = df.groupby(['A', 'B', 'C'])['E'].sum().reset_index()
#loop over unique IDs
for i, row in groupedSum.iterrows():
#define values
idValue = groupedSum.at[i,'C']
sumValue = groupedSum.at[i,'E']
#if sum is not zero
if (sumValue != 0):
#change values to zero if greater than zero
df['E'].loc[df['C'] == idValue] = df['E'].apply(lambda x: x if x > 0 else 0)
print(df)
A B C D E
0 4000074 SP000796746 201926 -1 0.0
1 4000074 SP000796746 201926 0 1000.0
2 4000074 SP000796746 201926 1 1000.0
3 4000074 SP000796746 201926 2 0.0
4 4000074 SP000796746 201926 3 0.0
5 4000074 SP000796746 201926 4 0.0
6 4000074 SP000796746 201909 -1 NaN
7 4000074 SP000796746 201909 0 NaN
8 4000074 SP000796746 201909 1 NaN
9 4000074 SP000796746 201909 2 NaN
10 4000074 SP000796746 201909 3 NaN
11 4000074 SP000796746 201909 4 NaN
12 4000074 SP000796746 201933 -1 0.0
13 4000074 SP000796746 201933 0 0.0
14 4000074 SP000796746 201933 1 0.0
15 4000074 SP000796746 201933 2 3000.0
16 4000074 SP000796746 201933 3 3000.0
17 4000074 SP000796746 201933 4 0.0
这篇关于Python pandas 填充DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!