问题描述
这是一个数据框:
A B C
0 6 2 -5
1 2 5 2
2 10 3 1
3 -5 2 8
4 3 6 2
我可以使用 df.apply
从原始 df
中检索基本上是列元组的列:
I could retrieve a column which is basically a tuple of columns from the original df
using df.apply
:
out = df.apply(tuple, 1)
print(out)
0 (6, 2, -5)
1 (2, 5, 2)
2 (10, 3, 1)
3 (-5, 2, 8)
4 (3, 6, 2)
dtype: object
但是如果我想要一个值列表而不是它们的元组,我不能这样做,因为它没有给我我期望的:
But if I want a list of values instead of a tuple of them, I can't do it, because it doesn't give me what I expect:
out = df.apply(list, 1)
print(out)
A B C
0 6 2 -5
1 2 5 2
2 10 3 1
3 -5 2 8
4 3 6 2
相反,我需要做:
out = pd.Series(df.values.tolist())
print(out)
0 [6, 2, -5]
1 [2, 5, 2]
2 [10, 3, 1]
3 [-5, 2, 8]
4 [3, 6, 2]
dtype: object
为什么我不能使用 df.apply(list, 1)
来得到我想要的?
Why can't I use df.apply(list, 1)
to get what I want?
附录
一些可能的解决方法的时间:
Timings of some possible workarounds:
df_test = pd.concat([df] * 10000, 0)
%timeit pd.Series(df.values.tolist()) # original workaround
10000 loops, best of 3: 161 µs per loop
%timeit df.apply(tuple, 1).apply(list, 1) # proposed by Alexander
1000 loops, best of 3: 615 µs per loop
推荐答案
罪魁祸首是 此处.使用 func=tuple
它可以工作,但是使用 func=list
会从编译的模块 lib.reduce
中引发异常:
The culprit is here. With func=tuple
it works, but using func=list
raises an exception from within the compiled module lib.reduce
:
ValueError: ('function does not reduce', 0)
如您所见,他们捕获了异常但并不费心去处理它.
As you can see, they catch the exception but don't bother to handle it.
即使没有太宽泛的 except 子句,这也是 Pandas 中的一个错误.您可能会尝试在他们的跟踪器上提出它,但类似的问题已被解决,并带有一些无法修复或欺骗的味道.
Even without the too-broad except clause, that's a bug in pandas. You might try to raise it on their tracker, but similar issues have been closed with some flavour of wont-fix or dupe.
16321:使用 apply() 创建基于当前列的列表的奇怪行为
15628:当 reduce=True 时,Dataframe.apply 并不总是返回系列
后一个问题在几个月前被关闭,然后重新打开,并转换为文档增强请求,现在似乎被用作任何相关问题的倾倒场.
That latter issue got closed, then reopened, and converted into a docs enhancement request some months ago, and now seems to be being used as a dumping ground for any related issues.
大概它不是一个高优先级,因为 piRSquared 评论(熊猫维护者之一评论相同),你最好使用列表理解:
Presumably it's not a high priority because, as piRSquared commented (and one of the pandas maintainers commented the same), you're better off with a list comprehension:
pd.Series([list(x) for x in df.itertuples(index=False)])
通常 apply
将使用 numpy ufunc 或类似的.
Typically apply
would be using a numpy ufunc or similar.
这篇关于为什么 df.apply(tuple) 有效但 df.apply(list) 无效?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!