本文介绍了将包含 unicodes 的 Pandas 字符串列转换为 ascii 以加载 url的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Pandas DataFrame,其中包含一个包含 Wikipedia url 的列,我想加载它.但是,某些字符串将无法加载,因为它们包含 unicode.例如,'Kruskal %E2%80%93Wallis_one-way_analysis_of_variance' 引发以下

I have a pandas DataFrame containing a column with Wikipedia urls, that I want to load. However, some strings won't load because they contain unicodes. For example, 'Kruskal %E2%80%93Wallis_one-way_analysis_of_variance' raises the following

PageError: Page id "Cauchy%E2%80%93Schwarz_inequality" does not match any      pages. Try another id!

有没有办法把所有的unicode都转成ascii?所以在这种情况下,我需要一个可以创建新列的函数:

Is there a way to turn all unicodes into ascii? So in this case, I need a function that can create a new column:

old column                            new column
Cauchy%E2%80%93Schwarz_inequality     Cauchy–Schwarz_inequality
Markov%27s_inequality                 Markov's_inequality

推荐答案

urllib.parse.unquote 应该可以解决问题.希望这会有所帮助.

urllib.parse.unquote should do the trick. Hope this helps.

In [1]: import urllib
   ...: 
   ...: import pandas as pd
   ...: 
   ...: 
   ...: df = pd.DataFrame({'url': ['Markov%27s_inequality', 'Cauchy%E2%80%93Schwarz_inequality']})
   ...: df['clean_url'] = df['url'].apply(urllib.parse.unquote)
   ...: 

In [2]: df
Out[2]: 
                                 url                  clean_url
0              Markov%27s_inequality        Markov's_inequality
1  Cauchy%E2%80%93Schwarz_inequality  Cauchy–Schwarz_inequality

这篇关于将包含 unicodes 的 Pandas 字符串列转换为 ascii 以加载 url的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-29 18:46