问题描述
有没有一种方法可以使用管道运算符来表达下面的pandas操作?
Is there a way I can express the pandas operations below using the pipe operator?
df_a = df[df.index.year != 2000]
df_b = df_a[(df_a['Month'].isin([3, 4, 5])) & (df_a['region'] == 'USA')]
推荐答案
不确定为什么要使用pipe
进行此操作.
Not sure why would you want to use pipe
for this operation.
pipe
旨在通过一系列修改传入的DataFrame(请参阅文档).
pipe
is intended to make easier syntax for chained processing of DataFrame with a chain of functions that modify incoming DataFrame (see docs).
您要执行的操作是过滤带有多个过滤器(或蒙版)的DataFrame.
What you are trying to do is to filter DataFrame with a number of filters (or masks).
仅说明在此操作中使用pipe
有点麻烦:
Just to illustrate that using pipe
for this operation is somewhat cumbersome:
import pandas as pd
pd.np.random.seed(123)
# Generate some data
dates = pd.date_range('2014-01-01', '2015-12-31', freq='M')
df = pd.DataFrame({'region':pd.np.random.choice(['USA', 'Non-USA'], len(dates))}, index=dates)
df['Month'] = df.index.month
print df.head()
region Month
2014-01-31 USA 1
2014-02-28 Non-USA 2
2014-03-31 USA 3
2014-04-30 USA 4
2014-05-31 USA 5
您的原始过滤器会产生:
Your original filter would yield:
df_a = df[df.index.year != 2014]
df_b = df_a[(df_a['Month'].isin([3, 4, 5])) & (df_a['region'] == 'USA')]
print df_b
region Month
2015-03-31 USA 3
2015-05-31 USA 5
以下是使用pipe
获得相同输出的方法:
Here is how you could use pipe
to get the same output:
def masker(df, mask):
return df[mask]
mask1 = df.index.year != 2014
mask2 = df['Month'].isin([3, 4, 5])
mask3 = df['region'] == 'USA'
print df.pipe(masker, mask1).pipe(masker, mask2).pipe(masker, mask3)
region Month
2015-03-31 USA 3
2015-05-31 USA 5
但是pandas能够以一种非常简单的方式(在这种情况下)处理过滤:
However pandas is able to process filtering in a rather simple (in this particular case) way:
print df[mask1 & mask2 & mask3]
region Month
2015-03-31 USA 3
2015-05-31 USA 5
这篇关于使用管道表达大 pandas 对接操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!