pyodbc multithreading

Colin_Loghin_1 · February 2014

Hello,
has anyone tried running a multithreaded python app using pyodbc ?

I found that the program below serializes at second db2.cursor call in either ODBC or pyodbc so that parallelism is lost . Any ideas where the culprit is ?

This is running on vertica 6.1.3 with 6.1.3 linux drivers on unixODBC 2.2.14

#!/usr/local/bin/python
import sys, io, os, datetime, pyodbc, threading, time
# db login info
dsn = 'VerticaDSN';password = '...';

def test():
db = pyodbc.connect("DSN=%s;PWD=%s" % (dsn, password));
cursor = db.cursor()
print "first query"
cursor.execute("create table temp.tn_foo2 as select * from ... limit 10000000;")
print "first query finished"

def test2():
print "running second query"
db2 = pyodbc.connect("DSN=%s;PWD=%s" % (dsn, password));
cursor2 = db2.cursor() cursor2.execute("select count(*) from ...;")
print "second finished"

print "starting thread"t=threading.Thread(target = test)
t.start()
time.sleep(1)

print "starting thread 2"t=threading.Thread(target = test2)
t.start()

Thanks,

Daniel_Leybovic · February 2014

Hi!

Read about limitations also. You are programming for Vertica, so read a limitation of Vertica
(or you will start to drive without to learn a rules? its possible, but you will get a lot of accidents).
You need connection per query and THREADING=1 defined in odbcint.ini

Vertica 7

https://my.vertica.com/docs/7.0.x/HTML/index.htm#Authoring/ProgrammersGuide/ClientODBC/ODBCFeatureSu...
The following features are not supported:

More than one open statement per connection. For example you cannot execute a new statement while another statement has a result set open. If you need to execute multiple statements at once, open multiple database connections.

Vertica 6
https://my.vertica.com/docs/6.1.x/HTML/index.htm#12795.htm
The following features are not supported:

More than one open statement per connection. For example you cannot execute a new statement while another statement has a result set open. If you need to execute multiple statements at once, open multiple database connections.

colin_loghin · February 2014

As seen above, we are using separate connection handlers to sent the different statements , but they seem to impact one another. There is no limitation described in the 6.1 doc coming close to this use case.

Daniel_Leybovic · February 2014

Indeed, my bad.

Daniel_Leybovic · February 2014

Hi Colin!

I found some python library, that works with Vertica much better than pyodbc IMO.
You can try it https://github.com/uber/vertica-python

When pyodbc fails on PROFILE and INTERVAL values, this library not and it works much more stable.

Some example (that doesn't work with pyodbc):

In [24]: cur.execute("select INTERVAL '1 DAY 12 HOURS 30 MINUTES 00 SECONDS';")

In [25]: cur.fetchone()
Out[25]: ['1 12:30']

Actually it doesn't depends on psycopg2, but if you want some features like "parameter bindings", so it require a psycopg2.

Hope it will help you
(I succeed to run in parallel, but as subprocess and not thread)

colin_loghin · February 2014

Thanks Daniel. I will check out the library.
I should have updated this thread as well earlier, because i was able to run multiple connections in parallel by mostly upgrading the unixODBC driver manager.

I upgraded the unixODBC driver manager to 2.3.2 and added in /usr/local/etc/odbcinst.ini a Threading attribute.

Regards,
Colin

We're Moving!

Create My New Community Account Now

pyodbc multithreading

Comments

Leave a Comment