本文介绍了将元组列表转换为序列的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑元组列表lst

lst = [('a', 10), ('b', 20)]

问题
将其转换为系列的最快方法是什么

question
What is the quickest way to convert this to the series

i
a    10
b    20
Name: c, dtype: int64

尝试

attempts

pd.DataFrame(lst, list('ic')).set_index('i').c

效率低下.

推荐答案

@Divakar's np.asarray(lst)有两个可能的缺点-它将所有内容都转换为字符串,要求熊猫将它们转换回字符串.而且,制造阵列的速度相对昂贵.

Two possible downsides to @Divakar's np.asarray(lst) - it converts everything to string, requiring Pandas to convert them back. And speed - making arrays is relatively expensive.

一种替代方法是使用zip(*)惯用法来转置"列表:

An alternative is to use the zip(*) idiom to 'transpose' the list:

In [65]: lst = [('a', 10), ('b', 20), ('j',1000)]
In [66]: zlst = list(zip(*lst))
In [67]: zlst
Out[67]: [('a', 'b', 'j'), (10, 20, 1000)]
In [68]: out = pd.Series(zlst[1], index = zlst[0])
In [69]: out
Out[69]:
a      10
b      20
j    1000
dtype: int32

请注意,我的dtype是int,而不是对象.

Note that my dtype is int, not object.

In [79]: out.values
Out[79]: array(['10', '20', '1000'], dtype=object)

因此,在数组的情况下,Pandas不会将值转换回整数.会将它们保留为字符串.

So in the array case, Pandas doesn't convert the values back to integer; it leaves them as strings.

==============

==============

我对时机的猜测不合时宜-我对熊猫系列的创作时间没有任何感觉.此外,样本太小,无法进行有意义的计时:

My guess about timings is off - I don't have any feel for pandas Series creation times. Also the sample is too small to do meaningful timings:

In [71]: %%timeit
    ...: out=pd.Series(dict(lst))
1000 loops, best of 3: 305 µs per loop
In [72]: %%timeit
    ...: arr=np.array(lst)
    ...: out = pd.Series(arr[:,1], index=arr[:,0])
10000 loops, best of 3: 198 µs per loop
In [73]: %%timeit
    ...: zlst = list(zip(*lst))
    ...: out = pd.Series(zlst[1], index=zlst[0])
    ...:
1000 loops, best of 3: 275 µs per loop

或强制执行整数解释

In [85]: %%timeit
    ...: arr=np.array(lst)
    ...: out = pd.Series(arr[:,1], index=arr[:,0], dtype=int)
    ...:
    ...:
1000 loops, best of 3: 253 µs per loop

这篇关于将元组列表转换为序列的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-16 07:15