我想将新列追加到“ trainData”上,两个数据框都有712行
当我尝试使用.assign方法追加新列“ Age”时,出现以下错误

使用dataFrames追加列的正确方法是什么?

df = pd.read_csv("data/train.csv")
#Dropping the columns
df = df.drop(['Ticket','Cabin'], axis=1)
#Dropping the na columns
df = df.dropna()
print("Age ====", df["Age"])
titanic_dummies = pd.get_dummies(df, columns=['Pclass', 'Sex', 'Embarked'])

trainData = titanic_dummies[ ["Pclass_1", "Pclass_2", "Pclass_3","Sex_female","Sex_male","Embarked_C","Embarked_Q","Embarked_S"]]
print("My train data",trainData)
trainData = trainData.assign(df["Age"])


以下是例外

  File "<ipython-input-79-3f3ce0263545>", line 1, in <module>
    runfile('C:/RafiWork/TASK/Personal/Data Science/Algorithmica/Day2/Titanic_Example/Test Neural Network/decisiontree.py', wdir='C:/RafiWork/TASK/Personal/Data Science/Algorithmica/Day2/Titanic_Example/Test Neural Network')

  File "C:\RafiWork\Softwares\MiniConda\envs\python35\lib\site-packages\spyder\utils\site\sitecustomize.py", line 688, in runfile
    execfile(filename, namespace)

  File "C:\RafiWork\Softwares\MiniConda\envs\python35\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/RafiWork/TASK/Personal/Data Science/Algorithmica/Day2/Titanic_Example/Test Neural Network/decisiontree.py", line 30, in <module>
    trainData = trainData.assign(df["Age"])

TypeError: assign() takes 1 positional argument but 2 were given

最佳答案

我认为您需要定义列名称:

trainData = trainData.assign(Age=df["Age"])


谢谢piRSquared的评论,如果索引的用法不同:

trainData = trainData.assign(Age=df["Age"].values)


但随后数据未按索引对齐。

样品:

import seaborn as sns
#sample df (similar like your data)
df = sns.load_dataset("titanic")
#capitalize columns names
df.columns = df.columns.str.capitalize()
print (df.head())
   Survived  Pclass     Sex   Age  Sibsp  Parch     Fare Embarked  Class  \
0         0       3    male  22.0      1      0   7.2500        S  Third
1         1       1  female  38.0      1      0  71.2833        C  First
2         1       3  female  26.0      0      0   7.9250        S  Third
3         1       1  female  35.0      1      0  53.1000        S  First
4         0       3    male  35.0      0      0   8.0500        S  Third

     Who  Adult_male Deck  Embark_town Alive  Alone
0    man        True  NaN  Southampton    no  False
1  woman       False    C    Cherbourg   yes  False
2  woman       False  NaN  Southampton   yes   True
3  woman       False    C  Southampton   yes  False
4    man        True  NaN  Southampton    no   True




df = df.dropna()
#print("Age ====", df["Age"])
titanic_dummies = pd.get_dummies(df, columns=['Pclass', 'Sex', 'Embarked'])

trainData = titanic_dummies[ ["Pclass_1", "Pclass_2", "Pclass_3","Sex_female","Sex_male","Embarked_C","Embarked_Q","Embarked_S"]]
#print("My train data",trainData.head())

trainData = trainData.assign(Age=df["Age"])
print("My train data",trainData.head())

My train data     Pclass_1  Pclass_2  Pclass_3  Sex_female  Sex_male  Embarked_C  \
1          1         0         0           1         0           1
3          1         0         0           1         0           0
6          1         0         0           0         1           0
10         0         0         1           1         0           0
11         1         0         0           1         0           0

    Embarked_Q  Embarked_S   Age
1            0           0  38.0
3            0           1  35.0
6            0           1  54.0
10           0           1   4.0
11           0           1  58.0


join的另一种解决方案:

trainData = trainData.join(df["Age"])
print("My train data",trainData.head())

My train data     Pclass_1  Pclass_2  Pclass_3  Sex_female  Sex_male  Embarked_C  \
1          1         0         0           1         0           1
3          1         0         0           1         0           0
6          1         0         0           0         1           0
10         0         0         1           1         0           0
11         1         0         0           1         0           0

    Embarked_Q  Embarked_S   Age
1            0           0  38.0
3            0           1  35.0
6            0           1  54.0
10           0           1   4.0
11           0           1  58.0


经过一些检查数据后,似乎可以将列Age添加到子集:

trainData = titanic_dummies[ ["Pclass_1", "Pclass_2", "Pclass_3",
                              "Sex_female","Sex_male",
                              "Embarked_C","Embarked_Q","Embarked_S",
                              "Age"]]

print("My train data",trainData.head())

My train data     Pclass_1  Pclass_2  Pclass_3  Sex_female  Sex_male  Embarked_C \
1          1         0         0           1         0           1
3          1         0         0           1         0           0
6          1         0         0           0         1           0
10         0         0         1           1         0           0
11         1         0         0           1         0           0

    Embarked_Q  Embarked_S   Age
1            0           0  38.0
3            0           1  35.0
6            0           1  54.0
10           0           1   4.0
11           0           1  58.0

关于python - 追加pandas列:TypeError:assign()接受1个位置参数,但给定2个,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/45912118/

10-12 21:43