We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


pyodbc multithreading — Vertica Forum

pyodbc multithreading

Hello, 
has anyone tried running a multithreaded python app using pyodbc ? 

I found that the program below serializes at second db2.cursor call in either ODBC or pyodbc so that parallelism is lost . Any ideas where the culprit is ? 

This is running on vertica 6.1.3 with 6.1.3 linux drivers on unixODBC 2.2.14

#!/usr/local/bin/python
import sys, io, os, datetime, pyodbc, threading, time
# db login info
dsn = 'VerticaDSN';password = '...';

def test():   
db = pyodbc.connect("DSN=%s;PWD=%s" % (dsn, password));   
cursor = db.cursor()   
print "first query"   
cursor.execute("create table temp.tn_foo2 as select * from ...  limit 10000000;")   
print "first query finished"


def test2():   
 print "running second query"   
 db2 = pyodbc.connect("DSN=%s;PWD=%s" % (dsn, password));   
cursor2 = db2.cursor()     cursor2.execute("select count(*) from ...;")   
  print "second finished"

print "starting thread"t=threading.Thread(target = test)
t.start()
time.sleep(1)

print "starting thread 2"t=threading.Thread(target = test2)
t.start()

Thanks,

Comments

  • Hi!

    Read about limitations also. You are programming for Vertica, so read a limitation of Vertica
    (or you will start to drive without to learn a rules? its possible, but you will get a lot of accidents). 
    You need connection per query and THREADING=1 defined in odbcint.ini

    Vertica 7

    https://my.vertica.com/docs/7.0.x/HTML/index.htm#Authoring/ProgrammersGuide/ClientODBC/ODBCFeatureSu...
    The following features are not supported:
    • More than one open statement per connection. For example you cannot execute a new statement while another statement has a result set open. If you need to execute multiple statements at once, open multiple database connections.
    Vertica 6
    https://my.vertica.com/docs/6.1.x/HTML/index.htm#12795.htm
    The following features are not supported:
    • More than one open statement per connection. For example you cannot execute a new statement while another statement has a result set open. If you need to execute multiple statements at once, open multiple database connections.
  • As seen above, we are using separate connection handlers to sent the different statements , but they seem to impact one another. There is no limitation described in the 6.1 doc coming close to this use case.
  • Indeed, my bad.
  • Hi Colin!

    I found some python library, that works with Vertica much better than pyodbc IMO.
    You can try it  https://github.com/uber/vertica-python

    When pyodbc fails on PROFILE and INTERVAL values, this library not and it works much more stable.

    Some example (that doesn't work with pyodbc):
    In [24]: cur.execute("select INTERVAL '1 DAY 12 HOURS 30 MINUTES 00 SECONDS';")

    In [25]: cur.fetchone()
    Out[25]: ['1 12:30']

    Actually it doesn't depends on psycopg2, but if you want some features like "parameter bindings", so it require a psycopg2.

    Hope it will help you
    (I succeed to run in parallel, but as subprocess and not thread)
  • Thanks Daniel. I will check out the library.
    I should have updated this thread as well earlier, because i was able to run multiple connections in parallel by mostly upgrading the unixODBC driver manager.

    I upgraded the unixODBC driver manager to 2.3.2 and added in /usr/local/etc/odbcinst.ini a Threading attribute.

    Regards,
    Colin

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file