问题描述
我需要获取具有特定列值的行作为键以下是我的熊猫df.
I need to get rows with particular column value as keyBelow is my pandas df.
>>> data
OrderID TimeStamp ErrorCode Duration ResponseType \
0 3000000 1488948188555841641 NaN IOC NaN
1 3000000 1488948188556444675 0 NaN NEW_ORDER_CONFIRM
2 3000000 1488948188556448153 2 NaN TRADE_CONFIRM
3 3000001 1488948658787676012 NaN IOC NaN
4 3000001 1488948658787811582 1 NaN NEW_ORDER_CONFIRM
5 3000001 1488948658787824862 2 NaN TRADE_CONFIRM
6 3000002 1488949064945887091 NaN IOC NaN
7 3000003 1488949109654115659 NaN IOC NaN
8 3000003 1488949109654294973 1 NaN NEW_ORDER_CONFIRM
9 3000003 1488949109654299930 16388 NaN CANCEL_ORDER_CONFIRM
我需要选择持续时间为IOC的所有orderID(相当容易)按照答案orders = data.loc[data.Duration == 'IOC', 'OrderID'].unique()
中的说明使用,然后获取持续时间为NaN的那些选定OrderID的行. OrderID将始终为3,或仅为一个ORDERID(无法返回任何输出或空行,例如OrderID 3000002)
I need to select all orderID where Duration is IOC (fairly easy)used as given in answer orders = data.loc[data.Duration == 'IOC', 'OrderID'].unique()
and then get the rows for those selected OrderID where duration is NaN. OrderID will always be in 3 or just a single ORDERID (for which no output or null row can be returned, like in case of OrderID 3000002)
棘手的部分是NEW_ORDER_CONFIRM中的错误代码正确,而TRADE_CONFIRM或CANCEL_ORDER_CONFIRM中的错误代码是错误的.我只想在最后一行的输出中得到那些正确的值.
The tricky part is that Errorcode in NEW_ORDER_CONFIRM is correct and the one in TRADE_CONFIRM or CANCEL_ORDER_CONFIRM are WRONG. I just want those correct values in my final row outputs.
EXPECTED O/P ROW 1
OrderID TimeStamp ErrorCode Duration ResponseType \
0 3000000 1488948188555841641 0 IOC TRADE_CONFIRM
我尝试通过使用grep IOC loglife| cut -d, -f1 to get OrderID then grep each OrderID & NaN
来使用bash.但是我需要一个效率更高的python解决方案
I tried using bash by using grep IOC loglife| cut -d, -f1 to get OrderID then grep each OrderID & NaN
. But I need a python solution which would be much more efficient
推荐答案
我认为您可以首先获取所有列OrderID
的unique
值,其中Duration
在IOC
中,然后通过 boolean indexing
-遮罩由 isin
与 isnull
:
I think you can first get all unique
values of column OrderID
where is IOC
in Duration
and then select all NaN
by boolean indexing
- mask is created by isin
with isnull
:
#unique can be omit, but then solution a bit slowier in big df
orders = df.loc[df.Duration == 'IOC', 'OrderID'].unique()
df = df[df.OrderID.isin(orders) & df.Duration.isnull()]
print (df)
OrderID TimeStamp ErrorCode Duration ResponseType
1 3000000 1488948188556448153 2.0 NaN TRADE_CONFIRM
3 3000001 1488948658787824862 2.0 NaN TRADE_CONFIRM
6 3000003 1488949109654299930 16388.0 NaN CANCEL_ORDER_CONFIRM
这篇关于使用特定的列值作为关键字在pandas数据框中进行搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!