本文介绍了我在列中丢失了我的价值观的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用熊猫整理了数据.然后我按照以下步骤填写程序

I've organized my data using pandas. and I fill my procedure out like below

import pandas as pd
import numpy as np
df1 = pd.read_table(r'E:\빅데이터 캠퍼스\골목상권 프로파일링 - 서울 열린데이터 광장 3.초기-16년5월분1\17.상권-추정매출\201301-201605\tbsm_trdar_selng.txt\tbsm_trdar_selng_utf8.txt' , sep='|' ,header=None
,dtype = { '0' : pd.np.int})

df1 = df1.replace('201301', int(201301))

df2 = df1[[0 ,1, 2, 3 ,4, 11,12 ,82 ]]

df2_rename = df2.columns = ['STDR_YM_CD', 'TRDAR_CD', 'TRDAR_CD_NM', 'SVC_INDUTY_CD', 'SVC_INDUTY_CD_NM', 'THSMON_SELNG_AMT', 'THSMON_SELNG_CO', 'STOR_CO'  ]

print(df2.head(40)) 

df3_groupby = df2.groupby(['STDR_YM_CD', 'TRDAR_CD' ])
df4_agg = df3_groupby.agg(np.sum)

print(df4_agg.head(30))

当我打印df2时,我可以在TRDAR_CD列中看到11947和11948的值.就像下面的图片一样

When I print df2 I can see the 11947 and 11948 values in my TRDAR_CD column. like below picture

之后,我使用了groupby函数,并且在TRDAR_CD列中丢失了11948的值.您可以在下图中看到这种情况

after that, I used groupby function and I lose my 11948 values in my TRDAR_CD column. You can see this situation in below picture

可能是警告消息中的此问题??警告消息是'sys:1:DtypeWarning:列(0)具有混合类型.在导入时指定dtype选项,或将low_memory = False设置为'.

probably, this problem from the warning message?? warning message is 'sys:1: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.'

请帮助我

print(df2.info())是

print(df2.info()) is

RangeIndex:1089023条目,0到1089022

RangeIndex: 1089023 entries, 0 to 1089022

数据列(共8列):

STDR_YM_CD 1089023非空对象

STDR_YM_CD 1089023 non-null object

TRDAR_CD 1089023非空int64

TRDAR_CD 1089023 non-null int64

TRDAR_CD_NM 1085428非空对象

TRDAR_CD_NM 1085428 non-null object

SVC_INDUTY_CD 1089023非空对象

SVC_INDUTY_CD 1089023 non-null object

SVC_INDUTY_CD_NM 1089023非空对象

SVC_INDUTY_CD_NM 1089023 non-null object

THSMON_SELNG_AMT 1089023非空int64

THSMON_SELNG_AMT 1089023 non-null int64

THSMON_SELNG_CO 1089023非空int64

THSMON_SELNG_CO 1089023 non-null int64

STOR_CO 1089023非空int64

STOR_CO 1089023 non-null int64

dtypes:int64(4),object(4)

dtypes: int64(4), object(4)

内存使用量:66.5+ MB

memory usage: 66.5+ MB

没有

推荐答案

MultiIndex 被称为第一列和第二列,并且如果默认情况下第一级具有重复项,它将分散"更高级别的索引,以使控制台输出在眼睛.

MultiIndex is called first and second columns and if first level has duplicates by default it 'sparsified' the higher levels of the indexes to make the console output a bit easier on the eyes.

您可以通过设置MultiIndex中显示数据> display.multi_sparse False.

You can show data in first level of MultiIndex by setting display.multi_sparse to False.

示例:

df = pd.DataFrame({'A':[1,1,3],
                   'B':[4,5,6],
                   'C':[7,8,9]})

df.set_index(['A','B'], inplace=True)

print (df)
     C
A B   
1 4  7
  5  8
3 6  9

#temporary set multi_sparse to False
#http://pandas.pydata.org/pandas-docs/stable/options.html#getting-and-setting-options
with pd.option_context('display.multi_sparse', False):
    print (df)
     C
A B   
1 4  7
1 5  8
3 6  9

通过问题编辑进行

我认为问题在于值11948的类型是string,因此被忽略了.

I think problem is type of value 11948 is string, so it is omited.

按文件

您可以通过在中添加参数usecols来简化解决方案. read_csv ,然后通过 GroupBy.sum :

You can simplify your solution by add parameter usecols in read_csv and then aggregating by GroupBy.sum:

import pandas as pd
import numpy as np

df2 = pd.read_table(r'tbsm_trdar_selng_utf8.txt' , 
                    sep='|' ,
                    header=None ,
                    usecols=[0 ,1, 2, 3 ,4, 11,12 ,82],
                    names=['STDR_YM_CD', 'TRDAR_CD', 'TRDAR_CD_NM', 'SVC_INDUTY_CD', 'SVC_INDUTY_CD_NM', 'THSMON_SELNG_AMT', 'THSMON_SELNG_CO', 'STOR_CO'],
                    dtype = { '0' : int})


df4_agg = df2.groupby(['STDR_YM_CD', 'TRDAR_CD' ]).sum()
print(df4_agg.head(10))
                     THSMON_SELNG_AMT  THSMON_SELNG_CO  STOR_CO
STDR_YM_CD TRDAR_CD                                            
201301     11947           1966588856            74798       73
           11948           3404215104            89064      116
           11949           1078973946            42005       45
           11950           1759827974            93245       71
           11953            779024380            21042       84
           11954           2367130386            94033      128
           11956            511840921            23340       33
           11957            329738651            15531       50
           11958           1255880439            42774      118
           11962           1837895919            66692       68

这篇关于我在列中丢失了我的价值观的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-16 10:12