我的pandas DataFrame中有一列看起来像这样:

df = pd.DataFrame([
    ['26.6 km'],
    ['19.67 km'],
    ['18.2 km'],
    ['20.77 km'],
    ['15.2 km'],
], columns=['Mileage'])


我有一个从列中删除“ km”的函数:

def remove_words(column):
    return column.str.split(' ').str[0]


当我将其放在DataFrameMapper中时:

mapper = DataFrameMapper([
     ('Mileage', [FunctionTransformer(remove_words)]),
     ], df_out=True)


...它返回错误“'numpy.ndarray'对象没有属性'str'”

救命!

最佳答案

使用extractreplace

df['Mileage'] = df['Mileage'].str.extract('(\d*\.?\d*)', expand=False).astype(float)


要么,

df['Mileage'] = df['Mileage'].str.replace('[^\d.]', '').astype(float)


这是例子

>>> import pandas as pd
>>> df = pd.DataFrame([
    ['26.6 km'],
    ['19.67 km'],
    ['18.2 km'],
    ['20.77 km'],
    ['15.2 km'],
], columns=['Mileage'])
>>> df['Mileage'].str.extract('(\d*\.?\d*)', expand=False).astype(float)
0    26.60
1    19.67
2    18.20
3    20.77
4    15.20
Name: Mileage, dtype: float64
>>> df['Mileage'].str.replace('[^\d.]', '').astype(float)
0    26.60
1    19.67
2    18.20
3    20.77
4    15.20
Name: Mileage, dtype: float64


或者,如果您想使用DataFrameMapper中的FunctionTransformersklearn_pandas

from sklearn_pandas import DataFrameMapper, FunctionTransformer

def remove_words(val):
    return val.split(' ')[0]

mapper = DataFrameMapper([
     ('Mileage', [FunctionTransformer(remove_words)]),
     ], df_out=True)

print(mapper.fit_transform(df))

  Mileage
0    26.6
1   19.67
2    18.2
3   20.77
4    15.2


对于sklearn.preprocessing.FunctionTransformer

from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import FunctionTransformer
import numpy as np

def remove_words(vals):
    return np.array([v[0].split(' ')[0] for v in vals])

mapper = DataFrameMapper([
     (['Mileage'], [FunctionTransformer(remove_words, validate=False)]),
     ], df_out=True)

print(mapper.fit_transform(df))

  Mileage
0    26.6
1   19.67
2    18.2
3   20.77
4    15.2


或使用numpy.vectorize

from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import FunctionTransformer
import numpy as np

func = np.vectorize(lambda x: x.split(' ')[0])

def remove_words(vals):
    return func(vals)

mapper = DataFrameMapper([
     (['Mileage'], [FunctionTransformer(remove_words, validate=False)]),
     ], df_out=True)

print(mapper.fit_transform(df))

  Mileage
0    26.6
1   19.67
2    18.2
3   20.77
4    15.2

关于python - 如何使FunctionTransformer在DataFrameMapper中工作,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/59670335/

10-10 22:31