我正在尝试基于2列(工厂和日期)为数据库创建多索引。
我希望“植物”列成为第一个成为过时的列,然后成为日期。
我工作了,但是由于某种原因,日期没有“汇总”到一个单元格中,就像您在这里看到的那样:
我的代码:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df_plants = pd.read_csv('Data_plants_26_11_2019.csv')
df_Nit=pd.read_csv('chemometrics.csv')
#create new colum which contains aonly the hour using lambda
df_plants['Hour']=df_plants['time'].apply(lambda time: time.split(' ')[1])
df_plants['date']=df_plants['time'].apply(lambda time: time.split(' ')[0])
#select only plants that their nitrogen content was checked
options=['J01B','J01C','J02C','J02D','J03B','J03C','J04C','J08C','J08D','J09A','J09C','J10A','J12C','J12D','J13A','J14A','J15A','J18A']
filter_plants=df_plants.loc[df_plants['plant'].isin(options)].copy()
filter_plants['Hour'] = pd.to_datetime(filter_plants['Hour']).apply(lambda x: str(x.hour) + ':00')
#index by plant ,date and hour
df_indices.set_index(['plant', 'date'], inplace=True)
df_indices.sort_index(inplace=True)
df_indices
我的最终目标是:在一个单元格中拥有相同的日期。
最佳答案
此故障是MultiIndex
的预期输出,'remove'
(实际上不显示)仅所有级别而没有最后一个,因此如果重复则在这里为第一级别。
如果创建3个级别的DataFrame,它会显示为您所需的内容:
df_indices.set_index(['plant', 'date', 'Hour'], inplace=True)
df_indices = pd.DataFrame({
'A':list('aaabbb'),
'B':list('eeffee'),
'C':[1,3,5,7,1,0],
'D':[5,3,6,9,2,4]
})
df_indices.set_index(['A', 'B'], inplace=True)
print (df_indices)
C D
A B
a e 1 5
e 3 3
f 5 6
b f 7 9
e 1 2
e 0 4
#temporaly display multi_sparse DataFrame (how data are real)
with pd.option_context('display.multi_sparse', False):
print (df_indices)
C D
A B
a e 1 5
a e 3 3
a f 5 6
b f 7 9
b e 1 2
b e 0 4
df_indices = pd.DataFrame({
'A':list('aaabbb'),
'B':list('eeffee'),
'C':[1,3,5,7,1,0],
'D':[5,3,6,9,2,4]
})
df_indices.set_index(['A', 'B', 'C'], inplace=True)
print (df_indices)
D
A B C
a e 1 5
3 3
f 5 6
b f 7 9
e 1 2
0 4
#temporaly display multi_sparse DataFrame (how data are real)
with pd.option_context('display.multi_sparse', False):
print (df_indices)
D
A B C
a e 1 5
a e 3 3
a f 5 6
b f 7 9
b e 1 2
b e 0 4