本文介绍了传递给kdb +时,pandas DataFrame删除索引(使用qPython API)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将时间序列数据从Python传递到q/kdb+.

I am trying to pass time-series data from Python to q/kdb+.

一种解决方案是 qPython模块,可从q表/字典进行无缝转换到熊猫.

One solution out there is qPython module, offering seamless conversion from q table/dictionary to Pandas.

问题是,当尝试将熊猫传递到q时,DataFrame中的时间索引(在列Date中)不能完全到达q边.可复制的代码:

The problem is when trying to pass from Pandas to q, the time index in DataFrame (in the column Date) doesn't quite make it to the q side. Reproducible code:

import pandas.io.data as web
import datetime
import numpy
import qpython.qconnection as qconnection # requires installation of qPython module from https://github.com/exxeleron/qPython

start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2015, 2, 6)
f=web.DataReader("F", 'yahoo', start, end) # download Ford stock data (ticker "F") from Yahoo Finance web service
f.ix[:5]  # explore first 5 rows of the DataFrame
# Out:
#             Open  High  Low  Close    Volume  Adj Close
#    Date
# 2010-01-04 10.17 10.28 10.05 10.28  60855800       9.43
# 2010-01-05 10.45 11.24 10.40 10.96 215620200      10.05
# 2010-01-06 11.21 11.46 11.13 11.37 200070600      10.43
# 2010-01-07 11.46 11.69 11.32 11.66 130201700      10.69
# 2010-01-08 11.67 11.74 11.46 11.69 130463000      10.72

q = qconnection.QConnection(host = 'localhost', port = 5000, pandas = True) # define connection interface parameters. Assumes we have previously started q server on port 5000 with `q.exe -p 5000` command
q.open() # open connection
q('set', numpy.string_('yahoo'), f) # pass DataFrame to q table named `yahoo`
q('5#yahoo') # display top 5 rows from newly created table on q server
# Out:
#    Open  High  Low  Close    Volume  Adj Close
# 0 10.17 10.28 10.05 10.28  60855800       9.43
# 1 10.45 11.24 10.40 10.96 215620200      10.05
# 2 11.21 11.46 11.13 11.37 200070600      10.43
# 3 11.46 11.69 11.32 11.66 130201700      10.69
# 4 11.67 11.74 11.46 11.69 130463000      10.72

如您所见,q表没有f DataFrame中存在的Date列作为索引.

As you can see, the q table doesn't have the Date column that was present in f DataFrame as index.

如何有效地(对于大数据)将日期时间索引传递给q?

How to efficiently (for large data) pass the datetime index to q?

推荐答案

在序列化DataFrame对象时,qPython检查meta属性的存在.如果该属性不存在,则DataFrame被序列化为q表,并且在此过程中跳过索引列.如果要保留索引列,则必须设置meta属性并提供类型提示以强制表示q键表.

While serializing DataFrame objects the qPython checks for the presence of meta attribute. If the attribute is not present, DataFrame is serialized as q table and index columns are skipped in the process. If you want to preserve the index columns, you have to set the meta attribute and provide type hinting to enforce representation a q keyed table.

请查看修改后的示例:

import pandas.io.data as web
import datetime
import numpy
import qpython.qconnection as qconnection # requires installation of qPython module from https://github.com/exxeleron/qPython

from qpython import MetaData
from qpython.qtype import QKEYED_TABLE


start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2015, 2, 6)
f=web.DataReader("F", 'yahoo', start, end) # download Ford stock data (ticker "F") from Yahoo Finance web service
f.ix[:5]  # explore first 5 rows of the DataFrame
# Out:
#             Open  High  Low  Close    Volume  Adj Close
#    Date
# 2010-01-04 10.17 10.28 10.05 10.28  60855800       9.43
# 2010-01-05 10.45 11.24 10.40 10.96 215620200      10.05
# 2010-01-06 11.21 11.46 11.13 11.37 200070600      10.43
# 2010-01-07 11.46 11.69 11.32 11.66 130201700      10.69
# 2010-01-08 11.67 11.74 11.46 11.69 130463000      10.72

q = qconnection.QConnection(host = 'localhost', port = 5000, pandas = True) # define connection interface parameters. Assumes we have previously started q server on port 5000 with `q.exe -p 5000` command
q.open() # open connection
f.meta = MetaData(**{'qtype': QKEYED_TABLE}) # enforce to serialize DataFrame as keyed table
q('set', numpy.string_('yahoo'), f) # pass DataFrame to q table named `yahoo`
q('5#yahoo') # display top 5 rows from newly created table on q server
# Out:
#              Open   High    Low  Close     Volume  Adj Close
# Date
# 2010-01-04  10.17  10.28  10.05  10.28   60855800       9.43
# 2010-01-05  10.45  11.24  10.40  10.96  215620200      10.05
# 2010-01-06  11.21  11.46  11.13  11.37  200070600      10.43
# 2010-01-07  11.46  11.69  11.32  11.66  130201700      10.69
# 2010-01-08  11.67  11.74  11.46  11.69  130463000      10.72

这篇关于传递给kdb +时,pandas DataFrame删除索引(使用qPython API)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-06 18:11