Options

pyodbc multithreading

Hello, 
has anyone tried running a multithreaded python app using pyodbc ? 

I found that the program below serializes at second db2.cursor call in either ODBC or pyodbc so that parallelism is lost . Any ideas where the culprit is ? 

This is running on vertica 6.1.3 with 6.1.3 linux drivers on unixODBC 2.2.14

#!/usr/local/bin/python
import sys, io, os, datetime, pyodbc, threading, time
# db login info
dsn = 'VerticaDSN';password = '...';

def test():   
db = pyodbc.connect("DSN=%s;PWD=%s" % (dsn, password));   
cursor = db.cursor()   
print "first query"   
cursor.execute("create table temp.tn_foo2 as select * from ...  limit 10000000;")   
print "first query finished"


def test2():   
 print "running second query"   
 db2 = pyodbc.connect("DSN=%s;PWD=%s" % (dsn, password));   
cursor2 = db2.cursor()     cursor2.execute("select count(*) from ...;")   
  print "second finished"

print "starting thread"t=threading.Thread(target = test)
t.start()
time.sleep(1)

print "starting thread 2"t=threading.Thread(target = test2)
t.start()

Thanks,

Comments

  • Options
    Hi!

    Read about limitations also. You are programming for Vertica, so read a limitation of Vertica
    (or you will start to drive without to learn a rules? its possible, but you will get a lot of accidents). 
    You need connection per query and THREADING=1 defined in odbcint.ini

    Vertica 7

    https://my.vertica.com/docs/7.0.x/HTML/index.htm#Authoring/ProgrammersGuide/ClientODBC/ODBCFeatureSu...
    The following features are not supported:
    • More than one open statement per connection. For example you cannot execute a new statement while another statement has a result set open. If you need to execute multiple statements at once, open multiple database connections.
    Vertica 6
    https://my.vertica.com/docs/6.1.x/HTML/index.htm#12795.htm
    The following features are not supported:
    • More than one open statement per connection. For example you cannot execute a new statement while another statement has a result set open. If you need to execute multiple statements at once, open multiple database connections.
  • Options
    As seen above, we are using separate connection handlers to sent the different statements , but they seem to impact one another. There is no limitation described in the 6.1 doc coming close to this use case.
  • Options
    Indeed, my bad.
  • Options
    Hi Colin!

    I found some python library, that works with Vertica much better than pyodbc IMO.
    You can try it  https://github.com/uber/vertica-python

    When pyodbc fails on PROFILE and INTERVAL values, this library not and it works much more stable.

    Some example (that doesn't work with pyodbc):
    In [24]: cur.execute("select INTERVAL '1 DAY 12 HOURS 30 MINUTES 00 SECONDS';")

    In [25]: cur.fetchone()
    Out[25]: ['1 12:30']

    Actually it doesn't depends on psycopg2, but if you want some features like "parameter bindings", so it require a psycopg2.

    Hope it will help you
    (I succeed to run in parallel, but as subprocess and not thread)
  • Options
    Thanks Daniel. I will check out the library.
    I should have updated this thread as well earlier, because i was able to run multiple connections in parallel by mostly upgrading the unixODBC driver manager.

    I upgraded the unixODBC driver manager to 2.3.2 and added in /usr/local/etc/odbcinst.ini a Threading attribute.

    Regards,
    Colin

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file