我有一个与以下问题类似的问题:Crosstab with multiple items,但我没有尝试在R中做到这一点,我正在尝试使用Crosstab在Python Pandas中做到这一点。

我一直在尝试使用Python Pandas交叉表功能制作人口统计表,但一次只能进行一次人口统计。换句话说,我想创建一个交叉表,使所有行变量处于同一级别。也许这不是交叉表的功能,而像Pandas数据透视表这样的功能会更好呢?

当前,我使用以下三行代码,但是会认为有某种方式可以将它们结合起来:

genderTable = pd.crosstab(refQtrData['GENDER'], [refQtrData['FUNDINGSOURCE'],refQtrData['PROVIDER'],refQtrData['LOCATION']], margins='true')
raceTable = pd.crosstab(refQtrData['RACETH4'], [refQtrData['FUNDINGSOURCE'],refQtrData['PROVIDER'],refQtrData['LOCATION']], margins='true')
ageTable = pd.crosstab(refQtrData['REFERRED'], [refQtrData['FUNDINGSOURCE'],refQtrData['PROVIDER'],refQtrData['LOCATION']], values=refQtrData['AGEREF'], aggfunc='mean')


我想做的是:
Demographic Table

其他杂项信息

最初使用以下代码在SPSS中完成此操作,但我尝试将其移至python。就像SPSS CTABLES允许我拥有多个类别和变量一样,我希望有多个行对应于不同的变量,而不必位于不同的级别。

CTABLES
  /VLABELS VARIABLES= GENDER RACE AGE FUNDINGSOURCE PROVIDER LOCATION
    DISPLAY=LABEL
  /TABLE REFERRED [C][COUNT F40.0] + GENDER [C][COUNT F40.0, COLPCT.COUNT PCTPAREN40.0] + RACE
    [C][COUNT F40.0, COLPCT.COUNT PCTPAREN40.0] + AGE [S][MEAN] + AGE [S][MINIMUM, MAXIMUM]
    BY FUNDINGSOURCE [C] > PROVIDER [C] > LOCATION [C]
  /SLABELS VISIBLE=NO
  /CATEGORIES VARIABLES=GENDER RACE ORDER=A KEY=VALUE MISSING=INCLUDE EMPTY=INCLUDE
  /CATEGORIES VARIABLES=FUNDINGSOURCE ORDER=A KEY=VALUE MISSING=INCLUDE EMPTY=EXCLUDE
  /CATEGORIES VARIABLES=PROVIDER [1, 2] EMPTY=EXCLUDE
  /CATEGORIES VARIABLES=LOCATION [1, 2] EMPTY=EXCLUDE.

最佳答案

在没有可复制的示例的情况下,我们可以依靠熊猫交叉表文档,该文档下面有一些复制/粘贴的示例交叉表。

import pandas as pd
import numpy as np

a = np.array(["foo", "foo", "foo", "foo", "bar", "bar","bar", "bar", "foo", "foo", "foo"], dtype=object)
b = np.array(["one", "one", "one", "two", "one", "one", "one", "two", "two", "two", "one"], dtype=object)
c = np.array(["dull", "dull", "shiny", "dull", "dull", "shiny", "shiny", "dull", "shiny", "shiny", "shiny"],dtype=object)
d = np.array(["1foo", "1foo", "1foo", "1foo", "1bar", "1bar","1bar", "1bar", "1foo", "1foo", "1foo"], dtype=object)


这给出了四个数组。制作交叉表。这将返回DataFrames。

df1 =  pd.crosstab(a, [b, c], rownames=['aa'], colnames=['b', 'c'])
df2 =  pd.crosstab(d, [b, c], rownames=['aa'], colnames=['b', 'c'])


pandas.concat([],axis=...)跟踪DataFrames

pd.concat([df1, df2], axis=0)
b     one        two
c    dull shiny dull shiny
aa
bar     1     2    1     0
foo     2     2    1     2
1bar    1     2    1     0
1foo    2     2    1     2

>>> pd.concat([df1, df2], axis=1)
b     one        two        one        two
c    dull shiny dull shiny dull shiny dull shiny
1bar  NaN   NaN  NaN   NaN  1.0   2.0  1.0   0.0
1foo  NaN   NaN  NaN   NaN  2.0   2.0  1.0   2.0
bar   1.0   2.0  1.0   0.0  NaN   NaN  NaN   NaN
foo   2.0   2.0  1.0   2.0  NaN   NaN  NaN   NaN


就通过一个函数调用创建三个交叉表而言,实现一个接受数据并返回级联交叉表的函数。不确定是否可以采用合理的单线方式完成。

然后留一个以进一步修改或以其他方式加入DataFrame。

关于python - Python交叉表多个变量或行;人口统计表,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/55168294/

10-17 00:39