Julia Dataframe 分组依据和数据透视表函数 | 分组依据和数据透视表函数

本文介绍了Julia Dataframe 分组依据和数据透视表函数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如何使用 Julia Dataframes 进行分组和透视表?

How do you do group by and pivot tables with Julia Dataframes?

假设我有数据框

using DataFrames

df =DataFrame(Location = [ "NY", "SF", "NY", "NY", "SF", "SF", "TX", "TX", "TX", "DC"],
                 Class = ["H","L","H","L","L","H", "H","L","L","M"],
                 Address = ["12 Silver","10 Fak","12 Silver","1 North","10 Fak","2 Fake", "1 Red","1 Dog","2 Fake","1 White"],
                 Score = ["4","5","3","2","1","5","4","3","2","1"])

我想做以下事情:

1) 一个带有 Location 和 Class 的数据透视表，应该输出

1) a pivot table with Location and Class which should output

Class     H  L  M
Location
DC        0  0  1
NY        2  1  0
SF        1  2  0
TX        1  2  0

2) 按位置"分组并计算该组中应输出的记录数

2) group by "Location" and a count on the number of records in that group, which should output

   Pop
DC  1
NY  3
SF  3
TX  3

推荐答案

您可以使用 unstack 来获得大部分的方法(DataFrames 没有索引，因此 Class 必须保留一列，而不是在 Pandas 中作为索引的地方)，这似乎是 DataFrames.jl 对 pivot_table 的回答:

You can use unstack to get you most of the way (DataFrames don't have an index so Class has to remain a column, rather than in pandas where it would be an Index), this seems to be DataFrames.jl's answer to pivot_table:

julia> unstack(df, :Location, :Class, :Score)
WARNING: Duplicate entries in unstack.
4x4 DataFrames.DataFrame
| Row | Class | H   | L   | M   |
|-----|-------|-----|-----|-----|
| 1   | "DC"  | NA  | NA  | "1" |
| 2   | "NY"  | "3" | "2" | NA  |
| 3   | "SF"  | "5" | "1" | NA  |
| 4   | "TX"  | "4" | "2" | NA  |

我不知道你是怎么fillna 的(unstack 没有这个选项)...

I'm not sure how you fillna here (unstack doesn't have this option)...

您可以使用 by 和 nrows(行数)方法进行分组:

You can do the groupby using by with the nrows (number of rows) method:

julia> by(df, :Location, nrow)
4x2 DataFrames.DataFrame
| Row | Location | x1 |
|-----|----------|----|
| 1   | "DC"     | 1  |
| 2   | "NY"     | 3  |
| 3   | "SF"     | 3  |
| 4   | "TX"     | 3  |

这篇关于Julia Dataframe 分组依据和数据透视表函数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！