我想将新列追加到“ trainData”上,两个数据框都有712行
当我尝试使用.assign方法追加新列“ Age”时,出现以下错误
使用dataFrames追加列的正确方法是什么?
df = pd.read_csv("data/train.csv")
#Dropping the columns
df = df.drop(['Ticket','Cabin'], axis=1)
#Dropping the na columns
df = df.dropna()
print("Age ====", df["Age"])
titanic_dummies = pd.get_dummies(df, columns=['Pclass', 'Sex', 'Embarked'])
trainData = titanic_dummies[ ["Pclass_1", "Pclass_2", "Pclass_3","Sex_female","Sex_male","Embarked_C","Embarked_Q","Embarked_S"]]
print("My train data",trainData)
trainData = trainData.assign(df["Age"])
以下是例外
File "<ipython-input-79-3f3ce0263545>", line 1, in <module>
runfile('C:/RafiWork/TASK/Personal/Data Science/Algorithmica/Day2/Titanic_Example/Test Neural Network/decisiontree.py', wdir='C:/RafiWork/TASK/Personal/Data Science/Algorithmica/Day2/Titanic_Example/Test Neural Network')
File "C:\RafiWork\Softwares\MiniConda\envs\python35\lib\site-packages\spyder\utils\site\sitecustomize.py", line 688, in runfile
execfile(filename, namespace)
File "C:\RafiWork\Softwares\MiniConda\envs\python35\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/RafiWork/TASK/Personal/Data Science/Algorithmica/Day2/Titanic_Example/Test Neural Network/decisiontree.py", line 30, in <module>
trainData = trainData.assign(df["Age"])
TypeError: assign() takes 1 positional argument but 2 were given
最佳答案
我认为您需要定义列名称:
trainData = trainData.assign(Age=df["Age"])
谢谢piRSquared的评论,如果索引的用法不同:
trainData = trainData.assign(Age=df["Age"].values)
但随后数据未按索引对齐。
样品:
import seaborn as sns
#sample df (similar like your data)
df = sns.load_dataset("titanic")
#capitalize columns names
df.columns = df.columns.str.capitalize()
print (df.head())
Survived Pclass Sex Age Sibsp Parch Fare Embarked Class \
0 0 3 male 22.0 1 0 7.2500 S Third
1 1 1 female 38.0 1 0 71.2833 C First
2 1 3 female 26.0 0 0 7.9250 S Third
3 1 1 female 35.0 1 0 53.1000 S First
4 0 3 male 35.0 0 0 8.0500 S Third
Who Adult_male Deck Embark_town Alive Alone
0 man True NaN Southampton no False
1 woman False C Cherbourg yes False
2 woman False NaN Southampton yes True
3 woman False C Southampton yes False
4 man True NaN Southampton no True
df = df.dropna()
#print("Age ====", df["Age"])
titanic_dummies = pd.get_dummies(df, columns=['Pclass', 'Sex', 'Embarked'])
trainData = titanic_dummies[ ["Pclass_1", "Pclass_2", "Pclass_3","Sex_female","Sex_male","Embarked_C","Embarked_Q","Embarked_S"]]
#print("My train data",trainData.head())
trainData = trainData.assign(Age=df["Age"])
print("My train data",trainData.head())
My train data Pclass_1 Pclass_2 Pclass_3 Sex_female Sex_male Embarked_C \
1 1 0 0 1 0 1
3 1 0 0 1 0 0
6 1 0 0 0 1 0
10 0 0 1 1 0 0
11 1 0 0 1 0 0
Embarked_Q Embarked_S Age
1 0 0 38.0
3 0 1 35.0
6 0 1 54.0
10 0 1 4.0
11 0 1 58.0
join
的另一种解决方案:trainData = trainData.join(df["Age"])
print("My train data",trainData.head())
My train data Pclass_1 Pclass_2 Pclass_3 Sex_female Sex_male Embarked_C \
1 1 0 0 1 0 1
3 1 0 0 1 0 0
6 1 0 0 0 1 0
10 0 0 1 1 0 0
11 1 0 0 1 0 0
Embarked_Q Embarked_S Age
1 0 0 38.0
3 0 1 35.0
6 0 1 54.0
10 0 1 4.0
11 0 1 58.0
经过一些检查数据后,似乎可以将列
Age
添加到子集:trainData = titanic_dummies[ ["Pclass_1", "Pclass_2", "Pclass_3",
"Sex_female","Sex_male",
"Embarked_C","Embarked_Q","Embarked_S",
"Age"]]
print("My train data",trainData.head())
My train data Pclass_1 Pclass_2 Pclass_3 Sex_female Sex_male Embarked_C \
1 1 0 0 1 0 1
3 1 0 0 1 0 0
6 1 0 0 0 1 0
10 0 0 1 1 0 0
11 1 0 0 1 0 0
Embarked_Q Embarked_S Age
1 0 0 38.0
3 0 1 35.0
6 0 1 54.0
10 0 1 4.0
11 0 1 58.0
关于python - 追加pandas列:TypeError:assign()接受1个位置参数,但给定2个,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/45912118/