本文介绍了将包含 unicodes 的 Pandas 字符串列转换为 ascii 以加载 url的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个 Pandas DataFrame,其中包含一个包含 Wikipedia url 的列,我想加载它.但是,某些字符串将无法加载,因为它们包含 unicode.例如,'Kruskal %E2%80%93Wallis_one-way_analysis_of_variance' 引发以下
I have a pandas DataFrame containing a column with Wikipedia urls, that I want to load. However, some strings won't load because they contain unicodes. For example, 'Kruskal %E2%80%93Wallis_one-way_analysis_of_variance' raises the following
PageError: Page id "Cauchy%E2%80%93Schwarz_inequality" does not match any pages. Try another id!
有没有办法把所有的unicode都转成ascii?所以在这种情况下,我需要一个可以创建新列的函数:
Is there a way to turn all unicodes into ascii? So in this case, I need a function that can create a new column:
old column new column
Cauchy%E2%80%93Schwarz_inequality Cauchy–Schwarz_inequality
Markov%27s_inequality Markov's_inequality
推荐答案
urllib.parse.unquote
应该可以解决问题.希望这会有所帮助.
urllib.parse.unquote
should do the trick. Hope this helps.
In [1]: import urllib
...:
...: import pandas as pd
...:
...:
...: df = pd.DataFrame({'url': ['Markov%27s_inequality', 'Cauchy%E2%80%93Schwarz_inequality']})
...: df['clean_url'] = df['url'].apply(urllib.parse.unquote)
...:
In [2]: df
Out[2]:
url clean_url
0 Markov%27s_inequality Markov's_inequality
1 Cauchy%E2%80%93Schwarz_inequality Cauchy–Schwarz_inequality
这篇关于将包含 unicodes 的 Pandas 字符串列转换为 ascii 以加载 url的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!