问题描述
我有一个熊猫数据框.
df = pd.DataFrame(['Donald Dump','Make America Great Again!','Donald Shrimp'],
columns=['text'])
我想在Dataframe中找到另一列,该列的'text'列中包含字符串的长度.
对于上面的示例,将是
text text_length
0 Donald Dump 11
1 Make America Great Again! 25
2 Donald Shrimp 13
我知道我可以遍历它并获取长度,但是有什么方法可以向量化此操作?我有几百万行.
我认为最简单的方法是使用DataFrame的apply
方法.使用这种方法,您可以根据需要任意操作数据.您可以执行以下操作:
df['text_ength'] = df['text'].apply(len)
使用所需的数据创建一个新列.
编辑:看到@jezrael回答后,我很好奇,决定下定时间.我创建了一个带有lorem ipsum句子(101000行)的DataFrame,两者之间的差别很小.对我来说,我得到了:
In [59]: %timeit df['text_length'] = (df.text.str.len())
10 loops, best of 3: 20.6 ms per loop
In [60]: %timeit df['text_length'] = df['text'].apply(len)
100 loops, best of 3: 17.6 ms per loop
I have a pandas dataframe.
df = pd.DataFrame(['Donald Dump','Make America Great Again!','Donald Shrimp'],
columns=['text'])
What I like to have is another column in Dataframe which has the length of the strings in the 'text' column.
For above example, it would be
text text_length
0 Donald Dump 11
1 Make America Great Again! 25
2 Donald Shrimp 13
I know I can loop through it and get the length but is there any way to vectorize this operation? I have few million rows.
I think the easiest way is to use the apply
method of the DataFrame.With this method you can manipulate the data any way you want.
You could do something like:
df['text_ength'] = df['text'].apply(len)
to create a new column with the data you want.
Edit After seeing @jezrael answer I was curious and decided to timeit.I created a DataFrame full with lorem ipsum sentences (101000 rows) and the difference is quite small. For me I got:
In [59]: %timeit df['text_length'] = (df.text.str.len())
10 loops, best of 3: 20.6 ms per loop
In [60]: %timeit df['text_length'] = df['text'].apply(len)
100 loops, best of 3: 17.6 ms per loop
这篇关于 pandas 向量化操作以获取字符串的长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!