Options

Vertica ODBC bulk load from memory?

running vertica 7.0.1 on a single linux server.
from 1-5 client applications use ODBC to execute a prepared statement which loads a batch of around 20,000 - 100,000 records (22 columns, ~100 bytes/rec)  each minute using the standard odbc bind()  - where each column gets a values vector and an indicator vector filled - and then sqlexecute(). This is currently implemented and works without issue.

what I'm wondering is:
1) can data be loaded in this use case by instead using COPY... FROM (using sqlexecutedirect) where loading is not from a file or named pipe, but from a buffer in memory (~10MB for 100K records)
2) would it be faster than the odbc prepared statements?

As an aside: Does the ODBC driver convert the prepared statement column value vectors into a COPY... FROM STDIN internally? I seem to remember seeing this assertion in a blog somewhere. If yes, is there a way to tune the batch size? eg, on the command line i can write a bunch of successive COPY... FROM... NO COMMIT  and then finally COMMIT which is presumably a lot faster than committing per row.




Comments

  • Options
    Hi!

    1. Yes, for example data on ram disk has a valid file descriptor (/dev/shm) or create memory mapped file or named pipe and stream data to pipe while copy from pipe (from file descriptor). You can do it with sockets (sockets are valid file descriptors too), stream to socket -> COPY reads from socket. Data source can be anything(while streaming) - buffer, mmap, file, whatever

    2. prepared statements automatically converted to COPY (with AUTOCOMMIT off)

    For example (vertica odbc log, level - trace):
    Apr 30 19:20:38 DEBUG 743249728 VQueryExecutor::ProcessParams: sqltype=-5, Oidtype=6, Len=1
    Apr 30 19:20:38 TRACE 743249728 VQueryExecutor::Execute: AUTOCOMMIT property is off, leaving COPY running
    Apr 30 19:20:38 TRACE 743249728 VQueryExecutor::GetResults: +++++ enter +++++

    Batch Insert Steps

    The steps your application needs to take in order to perform an ODBC Batch Insert are:

    1. Connect to the database.
    2. Disable autocommit for the connection.
    3. Create a prepared statement that inserts the data you want to load.
    4. Bind the parameters of the prepared statement to arrays that will contain the data you want to load.
    5. Populate the arrays with the data for your batches.
    6. Execute the prepared statement.
    7. Optionally, check the results of the batch load to find rejected rows.
    8. Repeat the previous three steps until all of the data you want to load is loaded.
    9. Commit the transaction.
    10. Optionally, check the results of the entire batch transaction.
    The following example code demonstrates a simplified version of the above steps.
    ...
    https://my.vertica.com/docs/7.0.x/HTML/index.htm#Authoring/NewFeatures/4.0/UsingBatchInserts.htm%3FT...


    Using Prepared Statements

    Server-side prepared statements are useful for:

    • Optimizing queries. HP Vertica only needs to parse the statement once.
    • Preventing SQL injection attacks. A SQL injection attack occurs when user input is either incorrectly filtered for string literal escape characters embedded in SQL statements or user input is not strongly typed and thereby unexpectedly run. Since a prepared statement is parsed separately from the input data, there is no chance the data can be accidentally executed by the database.
    • Binding direct variables to return columns. By pointing to data structures, the code doesn't have to perform extra transformations.
    https://my.vertica.com/docs/7.0.x/HTML/index.htm#Authoring/ProgrammersGuide/ClientDriverMisc/UsingPr...
  • Options
    thanks for the reply, I'll pay attention to that odbc log with -trace
  • Options
    Hi!

    Yes, with trace you can see all your actions and how them interpreted by driver. Sometimes very useful.
     
    An odbc session:
    In [13]: conn = pyodbc.connect(dsn='v7', user='daniel', autocommit=False)

    In [14]: cursor = conn.cursor()

    In [15]: cursor.executemany('insert into test values(?)', [(1,), (2,), (3,)])

    In [16]: cursor.execute('select get_num_accepted_rows()')
    Out[16]: <pyodbc.Cursor at 0x16b2cf0>

    In [17]: cursor.fetchone()
    Out[17]: (3L, )

    In [18]: cursor.commit()

    In [20]: conn.close()

    In [21]:
    Trace (a lot of lines omitted):
    ... VConnection::Connect: +++++ enter +++++
    ... VConnection::Connect: Connect to host :5433, db dev
    ...
    ... VConnection::BeginTransaction: +++++ enter +++++
    ...
    ... VDataEngine::Prepare: +++++ enter +++++
    ... VDataEngine::Prepare: Original Query: insert into test values(?)
    ... VDataEngine::Prepare: Query: insert into test values(?)
    ... VQueryExecutor::VQueryExecutor: +++++ enter +++++
    ... VDataEngine::Prepare: Initializing prepared statement: _PLAN0x19857e0_0
    ... VQueryExecutor::InitializePreparedExecutor: +++++ enter +++++
    ... VQueryExecutor::ClearCancel: +++++ enter +++++
    ... VQueryExecutor::PopulateParameters: +++++ enter +++++
    ...
    ... VQueryExecutor::ProcessParams: sqltype=-5, Oidtype=6, Len=1
    ... VQueryExecutor::Execute: AUTOCOMMIT property is off, leaving COPY running
    ... VQueryExecutor::GetResults: +++++ enter +++++
    ...
    ... VQueryExecutor::FinishStreamingExecution: +++++ enter +++++
    ...
    ... VDataEngine::Prepare: +++++ enter +++++
    ... VDataEngine::Prepare: Original Query: select get_num_accepted_rows()
    ... VDataEngine::Prepare: Query is issued as ExecDirect
    ... VDataEngine::Prepare: Query: select get_num_accepted_rows()
    ... VQueryExecutor::VQueryExecutor: +++++ enter +++++
    ... VDataEngine::Prepare: Initializing prepared statement: _PLAN0x1986700_1
    ... VQueryExecutor::InitializePreparedExecutor: +++++ enter +++++
    ...
    ... VBaseResultSet::DoMoveToNext: +++++ enter +++++
    ... VBaseResultSet::DoMoveToNext: m_rowItr: -1, m_currentResultSetSize: 1, resultStatus: 2
    ... VPQResultSet::GetSelectColumns: +++++ enter +++++
    ... VPQResultSet::GetData&colon; +++++ enter +++++
    ... Connection::SQLEndTran: +++++ enter +++++
    ...
    ... Connection::SQLDisconnect: +++++ enter +++++
    ... ConnectionState4::SQLDisconnect: +++++ enter +++++
    ... VConnection::Disconnect: +++++ enter +++++
    
    

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file