问题描述
我试图将解析的 dta 数据插入 postgresql 数据库,每行都是一个单独的变量表,它一直在工作,直到我在第二行中添加recodeid_fk".我现在尝试运行此代码时遇到的错误是:pg8000.errors.ProgrammingError: ('ERROR', '42601', 'syntax error at or near "imp"').
I am attempting to insert parsed dta data into a postgresql database with each row being a separate variable table, and it was working until I added in the second row "recodeid_fk". The error I now get when attempting to run this code is: pg8000.errors.ProgrammingError: ('ERROR', '42601', 'syntax error at or near "imp"').
最终,我希望能够同时解析多个文件并将数据插入到数据库中,但如果有人能帮助我了解现在发生了什么,那就太棒了.我使用的是Python 2.7.5,statareader来自pandas 0.12的开发记录,我对Python的经验很少.
Eventually, I want to be able to parse multiple files at the same time and insert the data into the database, but if anyone could help me understand whats going on now it would be fantastic. I am using Python 2.7.5, the statareader is from pandas 0.12 development records, and I have very little experience in Python.
dr = statareader.read_stata('file.dta')
a = 2
t = 1
for t in range(1,10):
z = str(t)
for date, row in dr.iterrows():
cur.execute("INSERT INTO tblv00{} (data, recodeid_fk) VALUES({}, {})".format(z, str(row[a]),29))
a += 1
t += 1
conn.commit()
cur.close()
conn.close()
推荐答案
针对您的具体错误...
语法错误可能来自需要用引号括起来的字符串 {}
.execute()
可以自动为你处理这些.替换
The syntax error probably comes from strings {}
that need quotes around them. execute()
can take care of this for you automtically. Replace
execute("INSERT INTO tblv00{} (data, recodeid_fk) VALUES({}, {})".format(z, str(row[a]),29))
execute("INSERT INTO tblv00{} (data, recodeid_fk) VALUES(%s, %s)".format(z), (row[a],29))
表名的填写方式与之前相同,但值将由execute
填充,如果需要,它会插入引号.也许 execute
也可以填写表名,我们可以完全删除 format
,但这将是一个不寻常的用法,我猜execute代码> 可能(错误地)在名称中间加上引号.
The table name is completed the same way as before, but the the values will be filled in by execute
, which inserts quotes if they are needed. Maybe execute
could fill in the table name too, and we could drop format
entirely, but that would be an unusual usage, and I'm guessing execute
might (wrongly) put quotes in the middle of the name.
但是有一个更好的方法......
Pandas 包含 一个用于将 DataFrames 写入的函数SQL 表.Postgresql 尚不支持,但在简单的情况下,您应该可以假装连接到 sqlite 或 MySQL 数据库并且没有问题.
Pandas includes a function for writing DataFrames to SQL tables. Postgresql is not yet supported, but in simple cases you should be able to pretend that you are connected to sqlite or MySQL database and have no trouble.
你对 z
的意图是什么?实际上,在继续下一个 for 循环之前,您将 z 从 '1'
循环到 '9'
.循环应该嵌套吗?也就是说,您的意思是将内容 dr
插入到名为 tblv001
到 tblv009
的九个不同表中吗?
What do you intend with z
here? As it is, you loop z from '1'
to '9'
before proceeding to the next for loop. Should the loops be nested? That is, did you mean to insert the contents dr
into nine different tables called tblv001
through tblv009
?
如果您的意思是循环将 dr
的不同部分放入不同的表中,请检查代码的缩进并加以澄清.
If you mean that loop to put different parts of dr
into different tables, please check the indentation of your code and clarify it.
无论哪种情况,上面的链接都应该处理 SQL 插入.
In either case, the link above should take care of the SQL insertion.
对编辑的回应
看起来 t
、z
和 a
正在做多余的事情.怎么样:
It seems like t
, z
, and a
are doing redundant things. How about:
import pandas as pd
import string
...
# Loop through columns of dr, and count them as we go.
for i, col in enumerate(dr):
table_name = 'tblv' + string.zfill(i, 3) # e.g., tblv001 or tblv010
df1 = DataFrame(dr[col]).reset_index()
df1.columns = ['data', 'recodeid_fk']
pd.io.sql.write_frame(df1, table_name, conn)
我使用 reset_index
将索引变成一列.write_frame
不会保存新的(顺序)索引.
I used reset_index
to make the index into a column. The new (sequential) index will not be saved by write_frame
.
这篇关于尝试将数据插入 postgresql 时出现语法错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!