所以我目前有一个管道,其中有很多客户变压器:

p = Pipeline([
("GetTimeFromDate",TimeTransformer("Date")), #Custom Transformer that adds ["time"] column
("GetZipFromAddress",ZipTransformer("Address")), #Custom Transformer that adds ["zip"] column
("GroupByTimeandZip",GroupByTransformer(["time","zip"]) #Custom Transformer that adds onehot columns
])


每个转换器接收一个pandas数据帧,并返回具有一个或多个新列的相同数据帧。它实际上工作得很好,但是如何并行运行“ GetTimeFromDate”和“ GetZipFromAddress”步骤?

我想使用FeatureUnion:

f = FeatureUnion([
("GetTimeFromDate",TimeTransformer("Date")), #Custom Transformer that adds ["time"] column
("GetZipFromAddress",ZipTransformer("Address")), #Custom Transformer that adds ["zip"] column])
])

p = Pipeline([
("FeatureUnionStep",f),
("GroupByTimeandZip",GroupByTransformer(["time","zip"]) #Custom Transformer that adds onehot columns
])


但是问题是FeatureUnion返回numpy.ndarray,但是“ GroupByTimeandZip”步骤需要一个数据帧。

有没有办法让FeatureUnion返回熊猫数据框?

最佳答案

要使FeatureUnion输出DataFrame,可以使用此blog post中的PandasFeatureUnion。另请参见gist

关于python-2.7 - 如何使FeatureUnion返回数据框,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/36652196/

10-12 19:28