python - 拆分列>>获取唯一值>>将唯一值添加回列

我正在学习python，并从Kaggle那里获取了一个数据集，以进一步了解python中的数据探索和可视化。

我在以下数据框中有以下格式的“美食”列：

North Indian, Mughlai, Chinese
Chinese, North Indian, Thai
Cafe, Mexican, Italian
South Indian, North Indian
North Indian, Rajasthani
North Indian
North Indian, South Indian, Andhra, Chinese

我想用逗号分割此列，并从此列获取唯一值。我想将那些唯一值作为新列添加回原始数据框中。

根据其他帖子，我尝试了以下操作：

1）隐式列出并设置和展平以获得唯一值

类型函数返回列的Series。将其转换为列表然后设置，将引发错误


type(fl1.cuisines)
pandas.core.series.Series

cuisines_type = fl1['cuisines'].tolist()
type(cuisines_type)
list

cuisines_type
#this returns list of cuisines

cuisines_set = set([ a for b in cuisines_type for a in b])
TypeError: 'float' object is not iterable

2）将其转换为数组和列表

cs = pd.unique(fl1['cuisines'].str.split(',',expand=True).stack())

type(cs)
Out[141]: numpy.ndarray

cs.tolist()

这将返回列表。但是我无法删除已添加到某些元素中的空格。

预期输出是美食的唯一列表，并将其添加回列中：

北印度|穆格莱|中文

最佳答案

我相信您需要Series.str.get_dummies，如果可能的话，请按每列max将其删除-输出始终为计数值的0或1的sum：

df = fl1.cuisines.str.get_dummies(', ').max(level=0, axis=1)
#if need count values
#df = fl1.cuisines.str.get_dummies(', ').sum(level=0, axis=1)
print (df)
   Andhra  Cafe  Chinese  Italian  Mexican  Mughlai  North Indian  Rajasthani  \
0       0     0        1        0        0        1             1           0
1       0     0        1        0        0        0             1           0
2       0     1        0        1        1        0             0           0
3       0     0        0        0        0        0             1           0
4       0     0        0        0        0        0             1           1
5       0     0        0        0        0        0             1           0
6       1     0        1        0        0        0             1           0

   South Indian  Thai
0             0     0
1             0     1
2             0     0
3             1     0
4             0     0
5             0     0
6             1     0

将您的解决方案与get_dummies结合使用，可能会发生类似的情况：

df = pd.get_dummies(fl1['cuisines'].str.split(', ',expand=True).stack()).max(level=0)

关于python - 拆分列>>获取唯一值>>将唯一值添加回列，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/56899563/