本文介绍了为什么Pandas和GeoPandas能够使用psycopg2连接读取数据库表,却不得不依靠SQLAlchemy来写一个?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试从 Python3 脚本.

I just get into trouble while trying to do some I/O operations on some databases from a Python3 script.

当我想连接到数据库时,我习惯了使用 psycopg2 来处理连接光标.

When I want to connect to a database, I habitually use psycopg2 in order to handle the connections and cursors.

我的数据通常存储为 Pandas DataFrames 和/或 GeoPandas 的等效 GeoDataFrames .

My data are usually stored as Pandas DataFrames and/or GeoPandas's equivalent GeoDataFrames.

我可以依靠其 .read_sql() 方法,该方法以 con 作为参数,如doc中所述:

I can rely on its .read_sql() methods which takes as a parameter con, as stated in the doc:

con : SQLAlchemy connectable (engine/connection) or database str URI
        or DBAPI2 connection (fallback mode)'
        Using SQLAlchemy makes it possible to use any DB supported by that
        library. If a DBAPI2 object, only sqlite3 is supported. The user is responsible
        for engine disposal and connection closure for the SQLAlchemy connectable. See
        `here <https://docs.sqlalchemy.org/en/13/core/connections.html>`_

使用 GeoPandas :

我可以依靠它的 .read_postigs() 方法,该方法将 con 作为参数,如文档中所述:

Using GeoPandas:

I can rely on its .read_postigs() methods which takes as a parameter con, as stated in the doc:

con : DB connection object or SQLAlchemy engine
        Active connection to the database to query.

为了数据写入数据库表;

使用 Pandas :

我可以依靠 .to_sql() 方法,该方法以 con 作为参数,如文档中所述:

In order to write data to a database table;

Using Pandas:

I can rely on the .to_sql() methods which takes as a parameter con, as stated in the doc:

con : sqlalchemy.engine.Engine or sqlite3.Connection
        Using SQLAlchemy makes it possible to use any DB supported by that
        library. Legacy support is provided for sqlite3.Connection objects. The user
        is responsible for engine disposal and connection closure for the SQLAlchemy
        connectable See `here                 <https://docs.sqlalchemy.org/en/13/core/connections.html>`_

使用 GeoPandas :

我可以依靠 .to_sql()方法(直接依赖于 Pandas .to_sql() )作为参数 con ,如文档中所述:

Using GeoPandas:

I can rely on the .to_sql() methods (which directly relies on the Pandas .to_sql()) which takes as a parameter con, as stated in the doc:

con : sqlalchemy.engine.Engine or sqlite3.Connection
        Using SQLAlchemy makes it possible to use any DB supported by that
        library. Legacy support is provided for sqlite3.Connection objects. The user
        is responsible for engine disposal and connection closure for the SQLAlchemy
        connectable See `here                 <https://docs.sqlalchemy.org/en/13/core/connections.html>`_

从这里,我很容易理解 GeoPandas 是基于 Pandas ,尤其是其 GeoDataFrame 对象,不久就是一个特殊的 DataFrame 可以处理地理数据.

From here, I easily understand that GeoPandas is built on Pandas especially for its GeoDataFrame object, which is, shortly, a special DataFrame that can handle geographic data.

但是我想知道为什么 GeoPandas 能够直接使用 psycopg2 连接作为参数,而不是 Pandas ,如果计划用于后者?

But I'm wondering why do GeoPandas has the ability to directly takes a psycopg2 connection as an argument and not Pandas and if it is planned for the latter?

为什么在写入数据时没有一种情况?
我想(可能还有许多其他的人,)直接给他们一个 psycopg2 connections 参数而不是依赖 SQLAlchemy 引擎.
因为这个工具真的很棒,所以它使我使用两个不同的框架连接到数据库,从而处理了两个不同的连接字符串(我个人更喜欢 psycopg2 处理字典中的参数扩展以正确构建连接字符串,例如; psycopg2.connect(** dict_params) URL 注入有关例如:是吗可以将字典传递到SQLAlchemy中的create_engine函数吗?).

And why is it neither the case for one nor the other when it comes to writing data?
I would like (as probably many of others) to directly give them a psycopg2 connections argument instead of relying on SQLAlchemy engine.
Because even is this tool is really great, it makes me use two different frameworks to connect to my database and thus handle two different connection strings (and I personally prefer the way psycopg2 handles the parameters expansion from a dictionary to build a connection string properly such as; psycopg2.connect(**dict_params) vs URL injection as explained here for example: Is it possible to pass a dictionary into create_engine function in SQLAlchemy?).

  1. 我首先使用 psycopg2 创建我的连接字符串这样的参数字典:

  1. I was first creating my connection string with psycopg2 from a dictionary of parameters this way:

connParams = ("user={}", "password={}", "host={}", "port={}", "dbname={}")
conn = ' '.join(connParams).format(*dict_params.values())

  • 然后我发现这种方法更好,也更pythonic:

  • Then I figured out it was better and more pythonic this way:

    conn = psycopg2.connect(**dict_params)
    

  • 最后我替换了它,以便可以互换使用它来构建 psycopg2 连接 SQLAlchemy引擎:

    def connector():
        return psycopg2.connect(**dict_params)
    

    a)初始化 psycopg2 连接:

    conn = connector()
    curs = conn.cursor()
    

    b)并初始化 SQLAlchemy 引擎方式:

    engine = create_engine('postgresql+psycopg2://', creator=connector)
    

  • (或与您的任何风味的 db + driver 一起使用)

    (or with any of your flavored db+driver)

    这在这里有很好的记录:
    https://docs.sqlalchemy.org/en/13/core/engines.html#custom-dbapi-args
    而在这里:
    https://docs.sqlalchemy.org/en/13/core/engines.html#sqlalchemy.create_engine

    This is well documented here:
    https://docs.sqlalchemy.org/en/13/core/engines.html#custom-dbapi-args
    and here:
    https://docs.sqlalchemy.org/en/13/core/engines.html#sqlalchemy.create_engine

    [1] 没有SQL Alchemy引擎的SQL数据框
    [2] 如何编写数据框到Postgres表而不使用SQLAlchemy引擎?

    推荐答案

    to_sql 需要SQLAlchemy Connectable ( Engine Connection )对象是 to_sql 需要能够创建数据库表(如果该表不存在或需要替换).大熊猫的早期版本只能与DBAPI连接一起使用,但是我怀疑当他们在 to_sql 中添加新功能时,他们发现自己编写了许多特定于数据库的代码来解决各种DDL实现的怪癖.

    Probably the main reason why to_sql needs a SQLAlchemy Connectable (Engine or Connection) object is that to_sql needs to be able to create the database table if it does not exist or if it needs to be replaced. Early versions of pandas worked exclusively with DBAPI connections, but I suspect that when they were adding new features to to_sql they found themselves writing a lot of database-specific code to work around the quirks of the various DDL implementations.

    意识到他们正在复制SQLAlchemy中已经存在的许多逻辑后,他们很可能决定通过简单地接受 Engine / Connection将所有复杂性外包"给SQLAlchemy本身.对象,并使用SQLAlchemy(独立于数据库)的SQL Expression语言创建表.

    On realizing that they were duplicating a lot of logic that was already in SQLAlchemy they likely decided to "outsource' all of that complexity to SQLAlchemy itself by simply accepting an Engine/Connection object and using SQLAlchemy's (database-independent) SQL Expression language to create the table.

    这篇关于为什么Pandas和GeoPandas能够使用psycopg2连接读取数据库表,却不得不依靠SQLAlchemy来写一个?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

    08-11 02:17