Developer: Open Source

Mastering Oracle+Python, Part 1: Querying Best Practices

by Przemyslaw Piotrowski

As a first step, get familiar with the basic concepts of Oracle-Python connectivity

Published September 2007

Amongthe core principles of Python's way of doing things there is a ruleabout having high-level interfaces to APIs. The Database API (in thiscase the Oracle API) is one example. Using the cx_Oracle Python modulefrom Computronix, you can take command over the Oracle query modelwhile maintaining compatibility with Python Database API Specificationv2.0.

The model of querying databases using DB API 2.0 remainsconsistent for all client libraries conforming to the specification. Ontop of this, Anthony Tuininga, the principal developer of cx_Oracle,has added a wide set of properties and methods that exposeOracle-specific features to developers. It is absolutely possible touse only the standard methods and forget about the "extra" ones, but inthis installment you won't be doing that. The concept of universaldatabase wrappers might work in some cases but at the same time, youlose all the optimizations that the RDBMS offers.

Introducing DB API 2.0 and cx_Oracle

ThePython Database API Specification v2.0 is a community effort to unifythe model of accessing different database systems. Having a relativelysmall set of methods and properties, it is easy to learn and remainsconsistent when switching database vendors. It doesn't map databaseobjects to Python structures in any way. Users are still required towrite SQL by hand. After changing to another database, this SQL wouldprobably need to be rewritten. Nevertheless it solves Python-databaseconnectivity issues in an elegant and clean manner.

The specification defines parts of the API such as themodule interface, connection objects, cursor objects, type objects andconstructors, optional extensions to the DB API and optional errorhandling mechanisms.

The gateway between the database andPython language is the Connection object. It contains all theingredients for cooking database-driven applications, not only adheringto the DB API 2.0 but being a superset of the specification methods andattributes. In multi-threaded programs, modules as well as connectionscan be shared across threads; sharing cursors is not supported. Thislimitation is usually acceptable because shareable cursors can carrythe risk of deadlocks.

Python makes extensive use of the exceptionmodel and the DB API defines several standard exceptions that could bevery helpful in debugging problems in the application. Below are thestandard exceptions with a short description of the types of causes:

Warning—Data was truncated during inserts, etc.
Error—Base class for all of the exceptions mentioned here except for Warning
InterfaceError—The database interface failed rather than the database itself (a cx_Oracle problem in this case)
DatabaseError—Strictly a database problem
DataError—Problems with the result data: division by zero, value out of range, etc.
OperationalError—Database error independent of the programmer: connection loss, memory allocation error, transaction processing error, etc.
IntegrityError—Database relational integrity has been affected, e.g. foreign key constraint fails
InternalError—Database has run into an internal error, e.g. invalid cursor, transaction out of synchronization
ProgrammingError—Table not found, syntax error in SQL statement, wrong number of parameters specified etc.
NotSupportedError—A non-existent part of API has been called

The connect process begins with the Connection object, which is thebase for creating Cursor objects. Beside cursor operations, theConnection object also manages transactions with the commit() androllback() methods. The process of executing SQL queries, issuingDML/DCL statements and fetching results are all controlled by cursors.

cx_Oracle extends the standard DB API 2.0 specification in itsimplementation of the Cursor and Connection classes at most. All suchextensions will be clearly marked in the text if needed.

Getting Started

Before working with queries and cursors, a connection to thedatabase needs to be established. The credentials and data source namescan be supplied in one of several ways, with similar results. In theextract from the Python interactive session below, connection objectsdb, db1 and db2 are all equivalent. The makedsn() function creates aTNS entry based on the given parameter values. Here it is beingassigned to the variable dsn_tns. When environment settings areproperly set then you can use the shorter formcx_Oracle.connect('hr/hrpwd'), skipping even the Easy Connect stringused for db and db1.

	  
>>> import cx_Oracle
>>> db = cx_Oracle.connect('hr', 'hrpwd', 'localhost:1521/XE')
>>> db1 = cx_Oracle.connect('hr/hrpwd@localhost:1521/XE')
>>> dsn_tns = cx_Oracle.makedsn('localhost', 1521, 'XE')
>>> print dsn_tns
(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=1521)))
  (CONNECT_DATA=(SID=XE)))
>>> db2 = cx_Oracle.connect('hr', 'hrpwd', dsn_tns)

Within the scope of a Connection object (such as assigned to the dbvariable above) you can get the database version by querying theversion attribute (an extension to DB API 2.0). This can be used tomake Python programs Oracle-version dependent. Likewise, you can getthe connect string for the connection by querying the dsn attribute.

>>> print db.version
10.2.0.1.0
>>> versioning = db.version.split('.')
>>> print versioning
['10', '2', '0', '1', '0']
>>> if versioning[0]=='10':
...     print "Running 10g"
... elif versioning[0]=='9':
...    print "Running 9i"
...
Running 10g
>>> print db.dsn
localhost:1521/XE

Cursor Objects

You can define an arbitrarynumber of cursors using the cursor() method of the Connection object.Simple programs will do fine with just a single cursor, which can beused over and over again. Larger projects might however require severaldistinct cursors.

	  
>>> cursor = db.cursor()

Application logic often requires clearly distinguishing the stages ofprocessing a statement issued against the database. This will helpunderstand performance bottlenecks better and allow writing faster,optimized code. The three stages of processing a statement are:

Parse (optional)

cx_Oracle.Cursor.parse([statement])
Not really required to be called because SQL statements areautomatically parsed at the Execute stage. It can be used to validatestatements before executing them. When an error is detected in such astatement, a DatabaseError exception is raised with a correspondingerror message, most likely "ORA-00900: invalid SQL statement,ORA-01031: insufficient privileges or ORA-00921: unexpected end of SQLcommand."

Execute

cx_Oracle.Cursor.execute(statement, [parameters], **keyword_parameters)
This method can accept a single argument - a SQL statement - to be rundirectly against the database. Bind variables assigned through theparameters or keyword_parameters arguments can be specified as adictionary, sequence, or a set of keyword arguments. If dictionary orkeyword arguments are supplied then the values will be name-bound. If asequence is given, the values will be resolved by their position. Thismethod returns a list of variable objects if it is a query, and Nonewhen it's not.
cx_Oracle.Cursor.executemany(statement, parameters)
Especially useful for bulk inserts because it can limit the number ofrequired Oracle execute operations to just a single one. For moreinformation about how to use it please see the "Many at once" sectionbelow.

Fetch (optional)—Only used for queries (becauseDDL and DCL statements don't return results). On a cursor that didn'texecute a query, these methods will raise an InterfaceError exception.

cx_Oracle.Cursor.fetchall()
Fetches all remaining rowsof the result set as a list of tuples. If no more rows are available,it returns an empty list. Fetch actions can be fine-tuned by settingthe arraysize attribute of the cursor which sets the number of rows toreturn from the database in each underlying request. The higher settingof arraysize, the fewer number of network round trips required. Thedefault value for arraysize is 1.
cx_Oracle.Cursor.fetchmany([rows_no])
Fetches the next rows_no rows from the database. If the parameter isn'tspecified it fetches the arraysize number of rows. In situations whererows_no is greater than the number of fetched rows, it simply gets theremaining number of rows.
cx_Oracle.Cursor.fetchone()
Fetches a single tuple from the database or none if no more rows are available.

Before going forward with cursor examples pleasewelcome the pprint function from the pprint module. It outputs Pythondata structures in a clean, readable form.

	  
>>> from pprint import pprint
>>> cursor.execute('SELECT * FROM jobs')
[, , 
 , ]
>>> pprint(cursor.fetchall())
[('AD_PRES', 'President', 20000, 40000),
 ('AD_VP', 'Administration Vice President', 15000, 30000),
 ('AD_ASST', 'Administration Assistant', 3000, 6000),
 ('FI_MGR', 'Finance Manager', 8200, 16000),
 ('FI_ACCOUNT', 'Accountant', 4200, 9000),
  |
 ('PR_REP', 'Public Relations Representative', 4500, 10500)]

cx_Oracle cursors are iterators. These powerful Pythonstructures let you iterate over sequences in a natural way that fetchessubsequent items on demand only. Costly database select operationsnaturally fit into this idea because the data only gets fetched whenneeded. Instead of creating or fetching the whole result set, youiterate until the desired value is found or another conditionfulfilled.

>>> cursor = db.cursor()
>>> cursor.execute('SELECT * FROM jobs')
[, , 
 , ]
>>> for row in cursor:    ## notice that this is plain English!
...     print row
...
('AD_VP', 'Administration Vice President', 15000, 30000)
('AD_ASST', 'Administration Assistant', 3000, 6000)
('FI_MGR', 'Finance Manager', 8200, 16000)
('FI_ACCOUNT', 'Accountant', 4200, 9000)
('AC_MGR', 'Accounting Manager', 8200, 16000)
 |
('PR_REP', 'Public Relations Representative', 4500, 10500)

Just after an execute list(cursor) does the same job ascursor.fetchall(). This is because the built-in list() functioniterates until the end of the given iterator.

Datatypes

During the fetch stage, basicOracle data types get mapped into their Python equivalents. cx_Oraclemaintains a separate set of data types that helps in this transition.The Oracle - cx_Oracle - Python mappings are:

Oracle	cx_Oracle	Python
VARCHAR2 NVARCHAR2 LONG	`cx_Oracle.STRING`	`str`
CHAR	cx_Oracle.FIXED_CHAR
NUMBER	cx_Oracle.NUMBER	int
FLOAT	float
DATE	cx_Oracle.DATETIME	datetime.datetime
TIMESTAMP	cx_Oracle.TIMESTAMP
CLOB	cx_Oracle.CLOB	cx_Oracle.LOB
BLOB	cx_Oracle.BLOB

The above data types are usuallytransparent to the user except for cases involving Large Objects. As ofversion 4.3, cx_Oracle still handles them itself and not wrapped withthe built-in file type.

Otherdata types that are not yet handled by cx_Oracle include XMLTYPE andall complex types. All queries involving columns of unsupported typeswill currently fail with a NotSupportedError exception. You need toremove them from queries or cast to a supported data type.

For example, consider the following table for storing aggregated RSS feeds:

	  
CREATE TABLE rss_feeds (
  feed_id NUMBER PRIMARY KEY,
  feed_url VARCHAR2(250) NOT NULL,
  feed_xml XMLTYPE
);

When trying to query this table with Python, some additionalsteps need to be performed. In the example below XMLType.GetClobVal()is used to return XML from the table as CLOB values.

>>> cursor.execute('SELECT * FROM rss_feeds')

Traceback (most recent call last):
  File "", line 1, in 
    cursor.execute('SELECT * FROM rss_feeds')
NotSupportedError: Variable_TypeByOracleDataType: unhandled data type 108
>>> cursor.execute('SELECT feed_id, feed_url, XMLType.GetClobVal(feed_xml) FROM rss_feeds')
[, , 
 ]

You might have already noticed the cx_Oracle.Cursor.execute*family of methods returns column data types for queries. These arelists of Variable objects (an extension to DB API 2.0), which get thevalue None before the fetch phase and proper data values after thefetch. Detailed information about data types is available through thedescription attribute of cursor objects. The description is a list of7-item tuples where each tuple consists of a column name, column type,display size, internal size, precision, scale and whether null ispossible. Note that column information is only accessible for SQLstatements that are queries.

>>> column_data_types = cursor.execute('SELECT * FROM employees')
>>> print column_data_types
[, , 
 , , 
 , , 
 , , 
 , , 
 ]
>>> pprint(cursor.description)
[('EMPLOYEE_ID', , 7, 22, 6, 0, 0),
 ('FIRST_NAME', , 20, 20, 0, 0, 1),
 ('LAST_NAME', , 25, 25, 0, 0, 0),
 ('EMAIL', , 25, 25, 0, 0, 0),
 ('PHONE_NUMBER', , 20, 20, 0, 0, 1),
 ('HIRE_DATE', , 23, 7, 0, 0, 0),
 ('JOB_ID', , 10, 10, 0, 0, 0),
 ('SALARY', , 12, 22, 8, 2, 1),
 ('COMMISSION_PCT', , 6, 22, 2, 2, 1),
 ('MANAGER_ID', , 7, 22, 6, 0, 1),
 ('DEPARTMENT_ID', , 5, 22, 4, 0, 1)]

Bind Variable Patterns

As advertised by Oracle guru Tom Kyte, bind variables are coreprinciples of database development. They do not only make programs runfaster but also protect against SQL injection attacks. Consider thefollowing queries:

SELECT * FROM emp_details_view WHERE department_id=50
SELECT * FROM emp_details_view WHERE department_id=60
SELECT * FROM emp_details_view WHERE department_id=90
SELECT * FROM emp_details_view WHERE department_id=110

When run one-by-one, each need to be parsed separately which addsextra overhead to your application. By using bind variables you cantell Oracle to parse a query only once. cx_Oracle supports bindingvariables by name or by position.

Passing bind variables by name requires the parameters argumentof the execute method to be a dictionary or a set of keyword arguments.query1 and query2 below are equivalent:

	  
>>> named_params = {'dept_id':50, 'sal':1000}
>>> query1 = cursor.execute('SELECT * FROM employees 
  WHERE department_id=:dept_id AND salary>:sal', named_params)
>>> query2 = cursor.execute('SELECT * FROM employees 
  WHERE department_id=:dept_id AND salary>:sal', dept_id=50, sal=1000)

When using named bind variables you can check the currently assigned ones using the bindnames() method of the cursor:

	  
>>> print cursor.bindnames()
['DEPT_ID', 'SAL']

Passing by position is similar but you need to be careful about naming.Variable names are arbitrary so it's easy to mess up queries this way.In the example below, all three queries r1, r2, and r3 are equivalent.The parameters variable must be given as a sequence.

>>> r1 = cursor.execute('SELECT * FROM locations 
 WHERE country_id=:1 AND city=:2', ('US', 'Seattle'))
>>> r2 = cursor.execute('SELECT * FROM locations 
 WHERE country_id=:9 AND city=:4', ('US', 'Seattle'))
>>> r3 = cursor.execute('SELECT * FROM locations 
 WHERE country_id=:m AND city=:0', ('US', 'Seattle'))

When binding, you can first prepare the statement and thenexecute None with changed parameters. Oracle will handle it as in theabove case, governed by the rule that one prepare is enough whenvariables are bound. Any number of executions can be involved forprepared statements.

>>> cursor.prepare('SELECT * FROM jobs WHERE min_salary>:min')
>>> r = cursor.execute(None, {'min':1000})
>>> print len(cursor.fetchall())
19

You have already limited the number of parses. In the nextparagraph we'll be eliminating unnecessary executions, especiallyexpensive bulk inserts.

Many at Once

Large insert operations don't require many separate inserts becausePython fully supports inserting many rows at once with thecx_Oracle.Cursor.executemany method. Limiting the number of executeoperations improves program performance a lot and should be the firstthing to think about when writing applications heavy on INSERTs.

Let's create a table for a Python module list, this time directly from Python. You will drop it later.

>>> create_table = """
CREATE TABLE python_modules (
  module_name VARCHAR2(50) NOT NULL,
  file_path VARCHAR2(300) NOT NULL
)
"""
>>> from sys import modules
>>> cursor.execute(create_table)
>>> M = []
>>> for m_name, m_info in modules.items():
...   try:
...     M.append((m_name, m_info.__file__))
...   except AttributeError:
...     pass
...
>>> len(M)
76
>>> cursor.prepare("INSERT INTO python_modules(module_name, file_path) VALUES (:1, :2)")
>>> cursor.executemany(None, M)
>>> db.commit()
>>> r = cursor.execute("SELECT COUNT(*) FROM python_modules")
>>> print cursor.fetchone()
(76,)
>>> cursor.execute("DROP TABLE python_modules PURGE")

Only one execute has been issued to the database to insert all76 module names. This is huge performance boost for large insertoperations. Notice two small quirks here: cursor.execute(create_tab)doesn't produce any output since it is a DDL statement and (76,) is atuple with a single element. (76) without a comma would simply beequivalent to an integer 76.

Conclusion

After familiarizing yourself with basic concepts of Oracle-Pythonconnectivity you are ready to start writing your own database-drivenapplications. I highly recommend playing with the Python interactiveshell for some time as it really brings down the learning curve.

You have learned about three stages that SQL statements go throughand how to minimize the number of steps the Oracle Database needs toperform. Bind variables are an inevitable part of database applicationdevelopment and Python enables binding them by name or by position.

You have also been introduced to the smooth transition betweenOracle and Python datatypes and the natural way of handling databasedata in the context of handling cursors as iterators. All thesefeatures boost productivity and enable focusing on the data, which iswhat it's all about.

Mastering Oracle+Python, Part 2: Working with Times and Dates
Mastering Oracle+Python, Part 3: Data Parsing

lyy_wt

Python cx_Oracle 操作Oracle 数据库