在 Pandas Series 或 DataFrame 中查找最后一个真值的索引

本文介绍了在 Pandas Series 或 DataFrame 中查找最后一个真值的索引的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试在 Pandas 布尔系列中找到最后一个 True 值的索引.我当前的代码如下所示.有没有更快或更干净的方法来做到这一点?

I'm trying to find the index of the last True value in a pandas boolean Series. My current code looks something like the below. Is there a faster or cleaner way of doing this?

import numpy as np
import pandas as pd
import string

index = np.random.choice(list(string.ascii_lowercase), size=1000)
df = pd.DataFrame(np.random.randn(1000, 2), index=index)
s = pd.Series(np.random.choice([True, False], size=1000), index=index)

last_true_idx_s = s.index[s][-1]
last_true_idx_df = df[s].iloc[-1].name

推荐答案

您可以使用 idxmax 和 argmax 的 Andy Hayden 答案:

You can use idxmax what is the same as argmax of Andy Hayden answer:

print s[::-1].idxmax()

比较:

这些时间将非常依赖于 s 的大小以及 Trues 的数量(和位置) - 谢谢.

These timings are going to be very dependent on the size of s as well as the number (and position) of Trues - thanks.

In [2]: %timeit s.index[s][-1]
The slowest run took 6.92 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 35 µs per loop

In [3]: %timeit s[::-1].argmax()
The slowest run took 6.67 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 126 µs per loop

In [4]: %timeit s[::-1].idxmax()
The slowest run took 6.55 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 127 µs per loop

In [5]: %timeit s[s==True].last_valid_index()
The slowest run took 8.10 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 261 µs per loop

In [6]: %timeit (s[s==True].index.tolist()[-1])
The slowest run took 6.11 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 239 µs per loop

In [7]: %timeit (s[s==True].index[-1])
The slowest run took 5.75 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 227 µs per loop

下一个解决方案:

print s[s==True].index[-1]

解决方案

(s[s==True].index.tolist()[-1])

在已删除的答案中.

这篇关于在 Pandas Series 或 DataFrame 中查找最后一个真值的索引的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！