我的pandas DataFrame中有一列看起来像这样:
df = pd.DataFrame([
['26.6 km'],
['19.67 km'],
['18.2 km'],
['20.77 km'],
['15.2 km'],
], columns=['Mileage'])
我有一个从列中删除“ km”的函数:
def remove_words(column):
return column.str.split(' ').str[0]
当我将其放在DataFrameMapper中时:
mapper = DataFrameMapper([
('Mileage', [FunctionTransformer(remove_words)]),
], df_out=True)
...它返回错误“'numpy.ndarray'对象没有属性'str'”
救命!
最佳答案
df['Mileage'] = df['Mileage'].str.extract('(\d*\.?\d*)', expand=False).astype(float)
要么,
df['Mileage'] = df['Mileage'].str.replace('[^\d.]', '').astype(float)
这是例子
>>> import pandas as pd
>>> df = pd.DataFrame([
['26.6 km'],
['19.67 km'],
['18.2 km'],
['20.77 km'],
['15.2 km'],
], columns=['Mileage'])
>>> df['Mileage'].str.extract('(\d*\.?\d*)', expand=False).astype(float)
0 26.60
1 19.67
2 18.20
3 20.77
4 15.20
Name: Mileage, dtype: float64
>>> df['Mileage'].str.replace('[^\d.]', '').astype(float)
0 26.60
1 19.67
2 18.20
3 20.77
4 15.20
Name: Mileage, dtype: float64
或者,如果您想使用
DataFrameMapper
中的FunctionTransformer
和sklearn_pandas
,from sklearn_pandas import DataFrameMapper, FunctionTransformer
def remove_words(val):
return val.split(' ')[0]
mapper = DataFrameMapper([
('Mileage', [FunctionTransformer(remove_words)]),
], df_out=True)
print(mapper.fit_transform(df))
Mileage
0 26.6
1 19.67
2 18.2
3 20.77
4 15.2
对于
sklearn.preprocessing.FunctionTransformer
,from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import FunctionTransformer
import numpy as np
def remove_words(vals):
return np.array([v[0].split(' ')[0] for v in vals])
mapper = DataFrameMapper([
(['Mileage'], [FunctionTransformer(remove_words, validate=False)]),
], df_out=True)
print(mapper.fit_transform(df))
Mileage
0 26.6
1 19.67
2 18.2
3 20.77
4 15.2
或使用
numpy.vectorize
from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import FunctionTransformer
import numpy as np
func = np.vectorize(lambda x: x.split(' ')[0])
def remove_words(vals):
return func(vals)
mapper = DataFrameMapper([
(['Mileage'], [FunctionTransformer(remove_words, validate=False)]),
], df_out=True)
print(mapper.fit_transform(df))
Mileage
0 26.6
1 19.67
2 18.2
3 20.77
4 15.2
关于python - 如何使FunctionTransformer在DataFrameMapper中工作,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/59670335/