python - Pandas :将包含下拉菜单的Excel列拆分为多个数据框列，并隔离错别字

我在Excel中具有以下列，该列是使用下拉菜单填充的。但是，添加了一些条目以禁用宏并手动键入响应。这造成了一些错别字。

      Answers
0     Yes       #correct
1     No        #correct
2     no        #typo - manually entered
3     noo       #typo - manually entered
4     yeah      #typo - manually entered
5     Yes, No   #correct (multiple entries are allowed)

我希望能够在保留原始列“ Answers”的地方创建一个新的数据框，但是我想再添加三列：“是”，“否”，“错别字”。
如果存在，则“是”和“否”将具有1，否则为0。 “错别字”列应包含字符串形式的未包含在可接受的答案列表中的所有内容，如果没有错字，则应将其分配为0。

输出示例：

      Answers   Yes    No    Typos
0     Yes       1      0     0
1     No        0      1     0
2     no        0      0     no
3     noo       0      0     noo
4     yeah      0      0     yeah
5     Yes, No   1      1     0

我的尝试包括识别“答案”列的唯一条目，如下所示：

all_answers = df['Answers'].str.get_dummies(', ')

这就是我创建其他列的方式：

accepted_ans=['Yes','No']
idx=1
for i,name in enumerate(all_answers.columns.tolist()):
    if i>0:
        if name in accepted_ans:
            df.insert(idx+i, name, all_answers[name])

这就是我管理“错别字”列的方式：

df['Typos']=0 #Create empty column with all zeros
for i in range (0, len(df)): #Iterate over the rows
    if df['Answers'].iloc[i] not in accepted_ans:
        df['Typos'].iloc[i]=df['Answers'].iloc[i]

我的问题：“错别字”列中充满了零，例如上面的if语句失败或下面的行。我将不胜感激任何建议。

最佳答案

df = pd.DataFrame(dict(answers=['Yes', 'No', 'no', 'noo', 'yeah', 'Yes, No']))
def typos(l):
    probs = [e for e in l if e not in ['Yes', 'No']]
    return ', '.join(probs) if probs else 0
>>> df.answers.str.split(', ').apply(typos)
0       0
1       0
2      no
3     noo
4    yeah
5       0
Name: answers, dtype: object

如果您的列是混合类型的（即，并非所有条目都是字符串），则可能需要先将其转换为字符串，即

df.answers.astype(str).str.split(', ').apply(typos)

关于python - Pandas :将包含下拉菜单的Excel列拆分为多个数据框列，并隔离错别字，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/49954777/