向量化操作以获取字符串的长度

向量化操作以获取字符串的长度

本文介绍了 pandas 向量化操作以获取字符串的长度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个熊猫数据框.

df = pd.DataFrame(['Donald Dump','Make America Great Again!','Donald Shrimp'],
                   columns=['text'])

我想在Dataframe中找到另一列,该列的'text'列中包含字符串的长度.

对于上面的示例,将是

                        text  text_length
0                Donald Dump           11
1  Make America Great Again!           25
2              Donald Shrimp           13

我知道我可以遍历它并获取长度,但是有什么方法可以向量化此操作?我有几百万行.

我认为最简单的方法是使用DataFrame的apply方法.使用这种方法,您可以根据需要任意操作数据.

您可以执行以下操作:

df['text_ength'] = df['text'].apply(len)

使用所需的数据创建一个新列.


编辑:看到@jezrael回答后,我很好奇,决定下定时间.我创建了一个带有lorem ipsum句子(101000行)的DataFrame,两者之间的差别很小.对我来说,我得到了:

In [59]: %timeit df['text_length'] = (df.text.str.len())
10 loops, best of 3: 20.6 ms per loop

In [60]: %timeit df['text_length'] = df['text'].apply(len)
100 loops, best of 3: 17.6 ms per loop

I have a pandas dataframe.

df = pd.DataFrame(['Donald Dump','Make America Great Again!','Donald Shrimp'],
                   columns=['text'])

What I like to have is another column in Dataframe which has the length of the strings in the 'text' column.

For above example, it would be

                        text  text_length
0                Donald Dump           11
1  Make America Great Again!           25
2              Donald Shrimp           13

I know I can loop through it and get the length but is there any way to vectorize this operation? I have few million rows.

解决方案

I think the easiest way is to use the apply method of the DataFrame.With this method you can manipulate the data any way you want.

You could do something like:

df['text_ength'] = df['text'].apply(len)

to create a new column with the data you want.


Edit After seeing @jezrael answer I was curious and decided to timeit.I created a DataFrame full with lorem ipsum sentences (101000 rows) and the difference is quite small. For me I got:

In [59]: %timeit df['text_length'] = (df.text.str.len())
10 loops, best of 3: 20.6 ms per loop

In [60]: %timeit df['text_length'] = df['text'].apply(len)
100 loops, best of 3: 17.6 ms per loop

这篇关于 pandas 向量化操作以获取字符串的长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-22 21:31