python - 如何在给定的数据帧中划分“位置”列？

我正在处理将列命名为标题的数据集。该值如上所述。

df = pd.DataFrame(data={"location":["düsseldorf, nordrhein-westfalen, germany",
                                    "durbanville , cape town, cape town , south africa"]})

我想在['city', 'state', 'country']中划分此列。请注意第二行重复。

我已经尝试过下面的方法，但这不能处理重复项：

location = df.location.str.split(', ', n=2, expand=True)

location.columns = ['city', 'state', 'country']

最佳答案

您可以使用unique_everseen docs中可用的itertools配方，该配方也可以在第三方库（例如toolz.unique）中使用。

该逻辑可以合并到迭代df['location']的列表理解中。这可能比不提供矢量化功能的基于Pandas字符串的方法更为有效。

from toolz import unique

res = pd.DataFrame([list(unique(map(str.strip, i.split(',')))) for i in df['location']])

res.columns = ['city', 'state', 'country']

print(res)

          city                state       country
0   düsseldorf  nordrhein-westfalen       germany
1  durbanville            cape town  south africa

关于python - 如何在给定的数据帧中划分“位置”列？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/52567930/