我目前正在从foral循环中的sqlalchemy查询中提取数据,并将不同的设备ID / accon_time对作为变量进行迭代
这个想法是一次提取一个设备/时间对的数据,并将其附加到熊猫数据帧以供以后处理
这样做的原因是大量或什至是小批量提取数据,这为我们的数据库带来了很高的IOPS警报
最初,我尝试仅在循环中使用df.append(),但是从查询对象中收到类型错误。
TypeError:无法连接类型为“ class”的对象
sqlalchemy.orm.query.Query“;仅Series和DataFrame objs有效
然后,我决定尝试将查询对象转换为序列,然后将序列转换为数据框:
for i in range(len(df)):
dev_id = (df.iloc[i]['dev_id'])
accon_time = (df.iloc[i]['accon_time'])
new_session = Session()
gps_qry = new_session.query(GPS).filter(GPS.dev_id == dev_id).filter(GPS.accon_time == accon_time). \
filter(GPS.tm_time.between(20190620145813, 20190620151400))
gps_qry = [repr(x) for x in gps_qry]
ser_2 = pd.Series(data=gps_qry)
#This part here is to split the comma seperated values in the series into 3 seperate columns:
df_2 = ser_2.apply(lambda x: pd.Series([val.split('=')[1] for val in x[1:-1].split(',')]))
df_2 = df_2.append(gps_qry, ignore_index=True)
df_2.columns = ['dev_id', 'accon_time', 'tm_time']
print(df_2)
我期望这样的事情:
dev_id accon_time tm_time
0 'A' 'B' 'C'
1 'D' 'E' 'F'
2 'G' 'H' 'I'
但是结果是这样的:
dev_id accon_time tm_time
0 'A' 'B' 'C'
1 'D,E,F...' NaN NaN
我不太确定发生了什么,但这是我最接近解决此问题的方法,因为其他所有方法都引发类型错误
最佳答案
不确定我是否能完整了解您的挑战,但是可以通过访问使用ORM构建的原始查询并将sqlalchemy
与pandas
结合使用,并按照@furas的建议将其传递给read_sql
。这样,您可以继续使用ORM进行查询创建。
例如:
gps_qry = new_session.query(GPS).filter(GPS.dev_id == dev_id).filter(GPS.accon_time == accon_time). \
filter(GPS.tm_time.between(20190620145813, 20190620151400))
gps_df = pd.read_sql(gps_qry.statement, con=db.engine)
## or
gps_df = pd.read_sql(str(gps_qry), con=db.engine)
或在循环中使用:
df_2 = pd.DataFrame()
for i in range(len(df)):
dev_id = (df.iloc[i]['dev_id'])
accon_time = int((df.iloc[i]['accon_time'])) ##updated to use python int.
new_session = Session()
gps_qry = new_session.query(GPS).filter(GPS.dev_id == dev_id).filter(GPS.accon_time == accon_time). \
filter(GPS.tm_time.between(20190620145813, 20190620151400))
### append in each loop
gps_df = pd.read_sql(str(gps_qry), con=db.engine)
df_2 = pd.concat([df_2, gps_df], axis=0, sort=False, ignore_index=True)
print(df_2)