问题描述
我有一个DataFrame,我想根据元素的索引名称放置元素
I have a DataFrame where I want to drop elements depending on their index name
col1 col2
entry_1 10 11
entry_2_test 12 13
entry_3 14 15
entry_4_test 16 17
基本上我想删除以_test结尾的那些
Basically I want to drop the ones ending with _test
我知道如何选择它们:
df.filter(like='_test', axis=0)
col1 col2
entry_2_test 12 13
entry_4_test 16 17
然后我实际上可以得到这些索引:
Then I can actually get those indexes:
df.filter(like='_test', axis=0).index
entry_2_test
entry_4_test
最后,我可以删除这些索引,并使用过滤后的索引覆盖数据框。
And finally I can drop those indexes and overwrite my dataframe with the filtered one.
df = df.drop(df.filter(like='_test', axis=0).index)
df
col1 col2
entry_1 10 11
entry_3 14 15
我的问题是这是正确的过滤方式还是
My question is if this is the correct way of filtering or there's a more efficient dedicated function to do this?
推荐答案
您可以将 str.endswith
:
In[13]:
df.loc[~df.index.str.endswith('_test')]
Out[13]:
col1 col2
entry_1 10 11
entry_3 14 15
或者将最后5个字符切成薄片,然后使用!=
进行比较:
Alternatively slice the last 5 characters and do a comparison using !=
:
In[13]:
df.loc[df.index.str[-5:]!='_test']
Out[18]:
col1 col2
entry_1 10 11
entry_3 14 15
仍然可以使用 filter
来传递正则表达式模式,以过滤出不以<$ c结尾的行$ c>'_ test':
It's still possible to use filter
by passing a regex pattern to filter out the rows that don't end with '_test'
:
In[25]:
df.filter(regex='.*[^_test]$', axis=0)
Out[25]:
col1 col2
entry_1 10 11
entry_3 14 15
如@ user3483203所指出的,最好使用以下正则表达式:
As pointed out by @user3483203 it's better to use the following regex:
df.filter(regex='.*(?<!_test)$', axis=0)
这篇关于按索引名称过滤数据框行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!