我正在尝试基于2列(工厂和日期)为数据库创建多索引。
我希望“植物”列成为第一个成为过时的列,然后成为日期。
我工作了,但是由于某种原因,日期没有“汇总”到一个单元格中,就像您在这里看到的那样:

python - 多索引失败-LMLPHP

我的代码:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df_plants = pd.read_csv('Data_plants_26_11_2019.csv')
df_Nit=pd.read_csv('chemometrics.csv')

#create new colum which contains aonly the hour using lambda
df_plants['Hour']=df_plants['time'].apply(lambda time: time.split(' ')[1])
df_plants['date']=df_plants['time'].apply(lambda time: time.split(' ')[0])

#select only plants that their nitrogen content was checked
options=['J01B','J01C','J02C','J02D','J03B','J03C','J04C','J08C','J08D','J09A','J09C','J10A','J12C','J12D','J13A','J14A','J15A','J18A']
filter_plants=df_plants.loc[df_plants['plant'].isin(options)].copy()

filter_plants['Hour'] = pd.to_datetime(filter_plants['Hour']).apply(lambda x: str(x.hour) + ':00')


#index by plant ,date and hour
df_indices.set_index(['plant', 'date'], inplace=True)
df_indices.sort_index(inplace=True)
df_indices



我的最终目标是:在一个单元格中拥有相同的日期。

最佳答案

此故障是MultiIndex的预期输出,'remove'(实际上不显示)仅所有级别而没有最后一个,因此如果重复则在这里为第一级别。

如果创建3个级别的DataFrame,它会显示为您所需的内容:

df_indices.set_index(['plant', 'date', 'Hour'], inplace=True)




df_indices = pd.DataFrame({
        'A':list('aaabbb'),
        'B':list('eeffee'),
        'C':[1,3,5,7,1,0],
        'D':[5,3,6,9,2,4]
})

df_indices.set_index(['A', 'B'], inplace=True)
print (df_indices)
     C  D
A B
a e  1  5
  e  3  3
  f  5  6
b f  7  9
  e  1  2
  e  0  4

#temporaly display multi_sparse DataFrame (how data are real)
with pd.option_context('display.multi_sparse', False):
    print (df_indices)
         C  D
    A B
    a e  1  5
    a e  3  3
    a f  5  6
    b f  7  9
    b e  1  2
    b e  0  4




df_indices = pd.DataFrame({
        'A':list('aaabbb'),
        'B':list('eeffee'),
        'C':[1,3,5,7,1,0],
        'D':[5,3,6,9,2,4]
})

df_indices.set_index(['A', 'B', 'C'], inplace=True)
print (df_indices)
       D
A B C
a e 1  5
    3  3
  f 5  6
b f 7  9
  e 1  2
    0  4

#temporaly display multi_sparse DataFrame (how data are real)
with pd.option_context('display.multi_sparse', False):
    print (df_indices)
           D
    A B C
    a e 1  5
    a e 3  3
    a f 5  6
    b f 7  9
    b e 1  2
    b e 0  4

10-07 21:53