问题描述
我有一个熊猫数据框,看起来像这样:
I have a pandas dataframe which looks like that :
qseqid sseqid qstart qend
2 1 125 345
4 1 150 320
3 2 150 450
6 2 25 300
8 2 50 500
我想根据这些条件基于其他行值删除行:如果存在另一行(r2)具有相同的sseqid
和r1[qstart] > r2[qstart]
和r1[qend] < r2[qend]
,则必须删除行(r1).
I would like to remove rows based on other rows values with these criterias : A row (r1) must be removed if another row (r2) exist with the same sseqid
and r1[qstart] > r2[qstart]
and r1[qend] < r2[qend]
.
大熊猫有可能吗?
推荐答案
df = pd.DataFrame({'qend': [345, 320, 450, 300, 500],
'qseqid': [2, 4, 3, 6, 8],
'qstart': [125, 150, 150, 25, 50],
'sseqid': [1, 1, 2, 2, 2]})
def remove_rows(df):
merged = pd.merge(df.reset_index(), df, on='sseqid')
mask = ((merged['qstart_x'] > merged['qstart_y'])
& (merged['qend_x'] < merged['qend_y']))
df_mask = ~df.index.isin(merged.loc[mask, 'index'].values)
result = df.loc[df_mask]
return result
result = remove_rows(df)
print(result)
收益
qend qseqid qstart sseqid
0 345 2 125 1
3 300 6 25 2
4 500 8 50 2
这个想法是使用pd.merge
与每对成对的行构成一个DataFrame具有相同的sseqid
:
The idea is to use pd.merge
to form a DataFrame with every pairing of rowswith the same sseqid
:
In [78]: pd.merge(df.reset_index(), df, on='sseqid')
Out[78]:
index qend_x qseqid_x qstart_x sseqid qend_y qseqid_y qstart_y
0 0 345 2 125 1 345 2 125
1 0 345 2 125 1 320 4 150
2 1 320 4 150 1 345 2 125
3 1 320 4 150 1 320 4 150
4 2 450 3 150 2 450 3 150
5 2 450 3 150 2 300 6 25
6 2 450 3 150 2 500 8 50
7 3 300 6 25 2 450 3 150
8 3 300 6 25 2 300 6 25
9 3 300 6 25 2 500 8 50
10 4 500 8 50 2 450 3 150
11 4 500 8 50 2 300 6 25
12 4 500 8 50 2 500 8 50
合并的每一行都包含来自两行df的数据.然后,您可以使用
Each row of merged contains data from two rows of df. You can then compare every two rows using
mask = ((merged['qstart_x'] > merged['qstart_y'])
& (merged['qend_x'] < merged['qend_y']))
并在df.index
中找到与该条件不匹配的标签:
and find the labels in df.index
that do not match this condition:
df_mask = ~df.index.isin(merged.loc[mask, 'index'].values)
然后选择这些行:
result = df.loc[df_mask]
请注意,这假定df
具有唯一索引.
Note that this assumes df
has a unique index.
这篇关于 pandas :根据其他行删除行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!