python - 如何使FunctionTransformer在DataFrameMapper中工作

我的pandas DataFrame中有一列看起来像这样：

df = pd.DataFrame([
    ['26.6 km'],
    ['19.67 km'],
    ['18.2 km'],
    ['20.77 km'],
    ['15.2 km'],
], columns=['Mileage'])

我有一个从列中删除“ km”的函数：

def remove_words(column):
    return column.str.split(' ').str[0]

当我将其放在DataFrameMapper中时：

mapper = DataFrameMapper([
     ('Mileage', [FunctionTransformer(remove_words)]),
     ], df_out=True)

...它返回错误“'numpy.ndarray'对象没有属性'str'”

救命！

最佳答案

使用extract或replace

df['Mileage'] = df['Mileage'].str.extract('(\d*\.?\d*)', expand=False).astype(float)

要么，

df['Mileage'] = df['Mileage'].str.replace('[^\d.]', '').astype(float)

这是例子

>>> import pandas as pd
>>> df = pd.DataFrame([
    ['26.6 km'],
    ['19.67 km'],
    ['18.2 km'],
    ['20.77 km'],
    ['15.2 km'],
], columns=['Mileage'])
>>> df['Mileage'].str.extract('(\d*\.?\d*)', expand=False).astype(float)
0    26.60
1    19.67
2    18.20
3    20.77
4    15.20
Name: Mileage, dtype: float64
>>> df['Mileage'].str.replace('[^\d.]', '').astype(float)
0    26.60
1    19.67
2    18.20
3    20.77
4    15.20
Name: Mileage, dtype: float64

或者，如果您想使用DataFrameMapper中的FunctionTransformer和sklearn_pandas，

from sklearn_pandas import DataFrameMapper, FunctionTransformer

def remove_words(val):
    return val.split(' ')[0]

mapper = DataFrameMapper([
     ('Mileage', [FunctionTransformer(remove_words)]),
     ], df_out=True)

print(mapper.fit_transform(df))

  Mileage
0    26.6
1   19.67
2    18.2
3   20.77
4    15.2

对于sklearn.preprocessing.FunctionTransformer，

from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import FunctionTransformer
import numpy as np

def remove_words(vals):
    return np.array([v[0].split(' ')[0] for v in vals])

mapper = DataFrameMapper([
     (['Mileage'], [FunctionTransformer(remove_words, validate=False)]),
     ], df_out=True)

print(mapper.fit_transform(df))

  Mileage
0    26.6
1   19.67
2    18.2
3   20.77
4    15.2

或使用numpy.vectorize

from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import FunctionTransformer
import numpy as np

func = np.vectorize(lambda x: x.split(' ')[0])

def remove_words(vals):
    return func(vals)

mapper = DataFrameMapper([
     (['Mileage'], [FunctionTransformer(remove_words, validate=False)]),
     ], df_out=True)

print(mapper.fit_transform(df))

  Mileage
0    26.6
1   19.67
2    18.2
3   20.77
4    15.2

关于python - 如何使FunctionTransformer在DataFrameMapper中工作，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/59670335/