问题描述
我正在尝试将时间序列数据从Python
传递到q/kdb+
.
I am trying to pass time-series data from Python
to q/kdb+
.
一种解决方案是 qPython
模块,可从q
表/字典进行无缝转换到熊猫.
One solution out there is qPython
module, offering seamless conversion from q
table/dictionary to Pandas.
问题是,当尝试将从熊猫传递到q
时,DataFrame
中的时间索引(在列Date
中)不能完全到达q
边.可复制的代码:
The problem is when trying to pass from Pandas to q
, the time index in DataFrame
(in the column Date
) doesn't quite make it to the q
side. Reproducible code:
import pandas.io.data as web
import datetime
import numpy
import qpython.qconnection as qconnection # requires installation of qPython module from https://github.com/exxeleron/qPython
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2015, 2, 6)
f=web.DataReader("F", 'yahoo', start, end) # download Ford stock data (ticker "F") from Yahoo Finance web service
f.ix[:5] # explore first 5 rows of the DataFrame
# Out:
# Open High Low Close Volume Adj Close
# Date
# 2010-01-04 10.17 10.28 10.05 10.28 60855800 9.43
# 2010-01-05 10.45 11.24 10.40 10.96 215620200 10.05
# 2010-01-06 11.21 11.46 11.13 11.37 200070600 10.43
# 2010-01-07 11.46 11.69 11.32 11.66 130201700 10.69
# 2010-01-08 11.67 11.74 11.46 11.69 130463000 10.72
q = qconnection.QConnection(host = 'localhost', port = 5000, pandas = True) # define connection interface parameters. Assumes we have previously started q server on port 5000 with `q.exe -p 5000` command
q.open() # open connection
q('set', numpy.string_('yahoo'), f) # pass DataFrame to q table named `yahoo`
q('5#yahoo') # display top 5 rows from newly created table on q server
# Out:
# Open High Low Close Volume Adj Close
# 0 10.17 10.28 10.05 10.28 60855800 9.43
# 1 10.45 11.24 10.40 10.96 215620200 10.05
# 2 11.21 11.46 11.13 11.37 200070600 10.43
# 3 11.46 11.69 11.32 11.66 130201700 10.69
# 4 11.67 11.74 11.46 11.69 130463000 10.72
如您所见,q表没有f
DataFrame中存在的Date
列作为索引.
As you can see, the q table doesn't have the Date
column that was present in f
DataFrame as index.
如何有效地(对于大数据)将日期时间索引传递给q?
How to efficiently (for large data) pass the datetime index to q?
推荐答案
在序列化DataFrame
对象时,qPython
检查meta
属性的存在.如果该属性不存在,则DataFrame
被序列化为q表,并且在此过程中跳过索引列.如果要保留索引列,则必须设置meta
属性并提供类型提示以强制表示q键表.
While serializing DataFrame
objects the qPython
checks for the presence of meta
attribute. If the attribute is not present, DataFrame
is serialized as q table and index columns are skipped in the process. If you want to preserve the index columns, you have to set the meta
attribute and provide type hinting to enforce representation a q keyed table.
请查看修改后的示例:
import pandas.io.data as web
import datetime
import numpy
import qpython.qconnection as qconnection # requires installation of qPython module from https://github.com/exxeleron/qPython
from qpython import MetaData
from qpython.qtype import QKEYED_TABLE
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2015, 2, 6)
f=web.DataReader("F", 'yahoo', start, end) # download Ford stock data (ticker "F") from Yahoo Finance web service
f.ix[:5] # explore first 5 rows of the DataFrame
# Out:
# Open High Low Close Volume Adj Close
# Date
# 2010-01-04 10.17 10.28 10.05 10.28 60855800 9.43
# 2010-01-05 10.45 11.24 10.40 10.96 215620200 10.05
# 2010-01-06 11.21 11.46 11.13 11.37 200070600 10.43
# 2010-01-07 11.46 11.69 11.32 11.66 130201700 10.69
# 2010-01-08 11.67 11.74 11.46 11.69 130463000 10.72
q = qconnection.QConnection(host = 'localhost', port = 5000, pandas = True) # define connection interface parameters. Assumes we have previously started q server on port 5000 with `q.exe -p 5000` command
q.open() # open connection
f.meta = MetaData(**{'qtype': QKEYED_TABLE}) # enforce to serialize DataFrame as keyed table
q('set', numpy.string_('yahoo'), f) # pass DataFrame to q table named `yahoo`
q('5#yahoo') # display top 5 rows from newly created table on q server
# Out:
# Open High Low Close Volume Adj Close
# Date
# 2010-01-04 10.17 10.28 10.05 10.28 60855800 9.43
# 2010-01-05 10.45 11.24 10.40 10.96 215620200 10.05
# 2010-01-06 11.21 11.46 11.13 11.37 200070600 10.43
# 2010-01-07 11.46 11.69 11.32 11.66 130201700 10.69
# 2010-01-08 11.67 11.74 11.46 11.69 130463000 10.72
这篇关于传递给kdb +时,pandas DataFrame删除索引(使用qPython API)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!