Vertica 7 ODBC driver hangs while inserting data

Hi guys,
Thanks for building such a great product.

I'm using Vertica ODBC drivers (v7.0.0, Community Edition) via ODBC API to connection Vertica7 server and to load about 200MB data.
Several strange issues encountered.

1. During inserting data, it might failed with error message,

Jan 06 14:51:40 INFO  7868 Statement::SQLGetStmtAttrW: Attribute: 10010Jan 06 14:51:40 INFO  7868 Statement::SQLGetStmtAttrW: Attribute: 10011Jan 06 14:51:40 INFO  7868 Statement::SQLGetStmtAttrW: Attribute: 10012Jan 06 14:51:40 INFO  7868 Statement::SQLGetStmtAttrW: Attribute: 10013Jan 06 14:51:40 INFO  7868 VDataEngine::Prepare: Original Query: INSERT INTO CM3TM2 ("NUMBER","ERP_TYPE","ERP_SID","ERP_CLIENT","ERP_GATEWAY_ID","ERP_APPROVER","ERP_IMPORTED","ERP_DEVELOPMENT_CLIENT","ERP_DEVELOPMENT_SID","ERP_DEVELOPMENT_GATEWAY_ID","ESTIMATE_UNITS","ESTIMATE_PRICE","ESTIMATE_DESCRIPTION","ESTIMATE_DELIVERY","ESTIMATE_BUDGET","ESTIMATE_EFFORT","ESTIMATE_GRADE","ACTUAL_UNITS","ACTUAL_COST","ACTUAL_PRICE","ACTUAL_GRADE","CORP_STRUCTURE","NEXT_BREACH","ID","TASK_UPDATE","COMPLETION_CODE","HOURS_WORKED","CLOSING_COMMENTS","PARTS","LABOR","SYSMODCOUNT","SYSMODUSER","SYSMODTIME","INITIAL_IMPACT","SEVERITY","FOLDER","ACTUALEND","ACTUALSTART","AFFECTED_ITEM","REJECTREASON","IMPACT_SEVERITY","RISK_SEVERITY","COLLISION_SEVERITY","APPROVALCOMMENTS","REVIEWER_ASSIGN_DEPT","REVIEWER_ASSIGNED_TO","REVIEW_COMMENTS") VALUES(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)Jan 06 14:51:40 INFO  7868 VDataEngine::Prepare: Query: INSERT INTO CM3TM2 ("NUMBER","ERP_TYPE","ERP_SID","ERP_CLIENT","ERP_GATEWAY_ID","ERP_APPROVER","ERP_IMPORTED","ERP_DEVELOPMENT_CLIENT","ERP_DEVELOPMENT_SID","ERP_DEVELOPMENT_GATEWAY_ID","ESTIMATE_UNITS","ESTIMATE_PRICE","ESTIMATE_DESCRIPTION","ESTIMATE_DELIVERY","ESTIMATE_BUDGET","ESTIMATE_EFFORT","ESTIMATE_GRADE","ACTUAL_UNITS","ACTUAL_COST","ACTUAL_PRICE","ACTUAL_GRADE","CORP_STRUCTURE","NEXT_BREACH","ID","TASK_UPDATE","COMPLETION_CODE","HOURS_WORKED","CLOSING_COMMENTS","PARTS","LABOR","SYSMODCOUNT","SYSMODUSER","SYSMODTIME","INITIAL_IMPACT","SEVERITY","FOLDER","ACTUALEND","ACTUALSTART","AFFECTED_ITEM","REJECTREASON","IMPACT_SEVERITY","RISK_SEVERITY","COLLISION_SEVERITY","APPROVALCOMMENTS","REVIEWER_ASSIGN_DEPT","REVIEWER_ASSIGNED_TO","REVIEW_COMMENTS") VALUES(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)

Jan 06 14:51:40 INFO  7868 VQueryExecutor::FinishStreamingExecution: Sending end of load message
Jan 06 14:51:40 ERROR 7868 Statement::SQLPrepareW: [Vertica][ODBC] (10000) General error: Unexpected exception has been caught.

- Then, if we continue loading other data, there would be random same error threw, and then the ODBC driver would be hung. 
   When the odbc driver hangs, from the call stack, it seems hangs in "free()" routine, sorry cannot provide full call stack.

- If we disconnect with vertica db and reconnect and retry the insert, the process would succeed.

- BTW, I'm using Win7 64bit, but my client is a 32-bit unicode application.

2. Another issue is, during data loading, I also need to fetch some BLOB data (long varbinary(655360) ) from database. And randomly, it might fetch incorrect data content,

Then, if fetch the same data again immediately, the result would be correct.

- More info about "incorrect data content",  the data size seems correct, but the data is incorrect. 
- And in the ODBC DSN data source, the "Driver string Conversions" is set to none.
  The option "Report Unicode columns as char" is unchecked. 
  SSL and Kerberos is not enabled.
  "Auto commit " is also unchecked.

I'm not sure whether this issue is result incorrect client application coding or caused by some defect in ODBC driver.
Could you please share more insight?

BTW, I'm not familiar with Vertica (only started 2 weeks ago) , if you need more information, please feel free to drop me a mail or ping me via OC (I'm working at HP, too :) )

Comments

  • Hi Sheng,

    Hm...  These are odd issues...  I don't work on the client-driver team personally, but I know we test for things like this and haven't seen these issues in testing.

    Do you happen to have a reproducer?  Some pattern or example code that relatively consistently reproduces either issue; including schema, sample data, etc.  (Maybe it has to run for a while in a loop or something; that's fine.)  That would really help us track this down.

    Thanks,
    Adam
  • Hi Adam,

    Thanks for your helps.

    Currently I don't have simple code to reproduce the issue. It's encountered when integrated with my product (HP Service Manager server).

    Maybe I can try to isolate and reproduce by some simple code, will update you in next week.

    Best Regards,
    Sheng-Yu

  • Hi Adam,

    Quick update, here is the sample output about incorrect BLOB(long varbinary) content,

    1.  Here is what I got via ODBC driver
         - The datalength 391 byte, is correct
         - The hex dump of raw data is incorrect, because from 10th char, the string should be "sctypecheck", not "sctype" only.
         - The blob is a compressed string which could only be decoded by the client application. (I know it's might not a good practice to store BLOB data in Vertica, we just evaluating it because there are pretty much BLOB fields in our system)
         - I didn't use bind parameters in the query. When using bind parameters, it seems more stable, but still might encounter this error sometimes.

    _prepare:SELECT * FROM DATADICTM1 READCOMMITTED WHERE "NAME"='sctypecheck'- EXECUTE:SELECT * FROM DATADICTM1 READCOMMITTED WHERE "NAME"='sctypecheck'- FETCH:SELECT * FROM DATADICTM1 READCOMMITTED WHERE "NAME"='sctypecheck'

    _getLobData - SQLGetData - LOB:DESCRIPTOR 

    sqTOD          : DESCRIPTOR                     #1: sql type=-1 , c type = -2 : 0x02F424F0-391
     
    0x02F424F0: 5F524346 4D2A3D90 2D0B7363 74797065 [_RCFM*=.-.sctype] 
    0x02F42500: 6D697363 656C6C61 6E656F75 73006C6C [miscellaneous.ll] 
    0x02F42510: 616E656F 75732F4F 2C626562 653D0841 [aneous/O,bebe=.A] 
    0x02F42520: CF51392B 5111112D 0763626C 616E636B [.Q9+Q..-.cblanck] 
    0x02F42530: 3D0841CF 7774E14E EEEF4F2F 2F2F802D [=.A.wt.N..O///.-] 
    0x02F42540: 07636F6E 76657274 2D066669 656C6473 [.convert-.fields] 
    0x02F42550: 2D096669 6C652E6E 616D652D 0B737973 [- file.name-.sys] 
    0x02F42560: 6D6F6463 6F756E74 2D0A7379 736D6F64 [modcount-.sysmod]
    0x02F42570: 74696D65 2D0A7379 736D6F64 75736572 [time-.sysmoduser] 
    0x02F42580: 73637479 70656368 65636B00 06466965 [sctypecheck..Fie] 
    0x02F42590: 6C64732D 0946696C 65204E61 6D652D0B [lds- File Name-.] 
    0x02F425A0: 5379736D 6F64636F 756E742D 0A537973 [Sysmodcount-.Sys] 
    0x02F425B0: 6D6F6474 696D652D 0A537973 6D6F6475 [modtime-.Sysmodu]  


    2. Output of VSQL which seems correct.
        - It's run in same machine of ODBC client application (WIN7 64bit).
        - The output seems correct because there is "sctypecheck" in the beginning.

    sm=> SELECT DESCRIPTOR FROM DATADICTM1 READCOMMITTED WHERE "NAME"='sctypecheck';

    DESCRIPTOR _RCFM*=\220-\013sctypecheck@\201\200\201\200\024\031\010\022\021\023\022\201\200\201\200\201\200AAAAAA\201\200\201\200\201\200\201\026-\007cblanck=\010A\317wt\341N\356\357////O\200\201O\200-\011file.name\201/\037\037O\200\201/////////\200\201/-\013sctypecheck/\037////O/\200\201\200\201OOO//\200\201O/O/O/OO\200\201OO\200\201/\200\201A///\037\037\037/O\200\201\200\201\200\024\022\022\021\023\022\201\200\201\200\201\200\201\200\201O/\200\201\221(1 row)

    3. The table schema

    sm=> select column_name, data_type, data_type_id, data_type_length, character_maximum_length  from columns where table_name ilike 'datadictm1'; 
     column_name    |       data_type        | data_type_id | data_type_length | character_maximum_length
    ------------------+------------------------+--------------+------------------+-------------------------- 
    DESCRIPTOR       | long varbinary(655360) |          116 |           655360 | NAME             | varchar(104)           |            9 |              104 |                      104 
    SYSTEM           | varchar(54)            |            9 |               54 |                       54 
    TYPE             | varchar(54)            |            9 |               54 |                       54 
    SQLBASENAME      | varchar(44)            |            9 |               44 |                       44
    SC_MANAGE_SCREEN | varchar(64)            |            9 |               64 |                       64
    LOCALIZEDTABLE   | char(1)                |            8 |                1 |                        1 

    4. Our sample code to fetch the data


    /* execute a ';'-terminated SQL select statement */
    SQ_CODE sqvtcSelect( RELBLK *pRel, char *stmt, SQLDA *sqlda, PUSER_BLOCK tuserp)
    {
      SQLRETURN sqlrc;
      SQ_CODE rc;
      SQLHANDLE lhStmt = HNULL;

      if ( rc = sqvtcConnect( FALSE, tuserp ) )
        return rc;

      lhStmt = _allocateStatementHandle( stmt, tuserp );
      if ( HNULL == lhStmt )
        return SQ_ERROR;

      do  { 
       /*    Prepare the statement and bind the columns using the    already built SQLDA    */   

    if ( ( rc = _prepare( lhStmt, stmt, tuserp ) ) != SQ_SUCCESS ) 
        break;

    //    if ( ( rc = _setBindVariables( lhStmt, stmt, tuserp ) ) != SQ_SUCCESS )
    //      break;

        /* Execute the statement to get the record selected */   
    START_TIMER( SQL_MODULE_PREFIX"Select - EXECUTE", stmt );
     sqlrc = SQLExecute( lhStmt );
    STOP_TIMER( SQL_MODULE_PREFIX"Select - EXECUTE", stmt );

    rc = _errorStatement( "SQLExecute [in "SQL_MODULE_PREFIX"Select]", stmt, ROLLBACK, sqlrc, lhStmt, tuserp );

    if ( rc != SQ_SUCCESS )
          break;
    /* Fetch the record into the buffers */

    START_TIMER( SQL_MODULE_PREFIX"Select - FETCH", stmt );
    sqlrc = SQLFetch( lhStmt );
    rc = _errorStatement( "SQLFetch [in "SQL_MODULE_PREFIX"Select]", stmt, ROLLBACK, sqlrc, lhStmt, tuserp );

    if ( rc != SQ_SUCCESS ) 

        STOP_TIMER( SQL_MODULE_PREFIX"Select - FETCH NOT FOUND", stmt );
          break;

    STOP_TIMER( SQL_MODULE_PREFIX"Select - FETCH", stmt );
        /* Transfer data into our buffers */

    rc = _getLobData( sqlda, lhStmt, tuserp );  } while ( 0 );

    _freeStatementHandle( lhStmt, tuserp );  lhStmt = HNULL;
    // sqvtcFreeBindvars( tuserp );                    /* free allocated bind vars */
    return rc;

    }


    SQ_CODE _getLobData(SQLDA *sqlda, SQLHSTMT hstmt, PUSER_BLOCK tuserp)
    {
      // ...;

      #define LOB_BUF1_LEN  327670
        char buffer[LOB_BUF1_LEN];
        memset(buffer, 0, LOB_BUF1_LEN);

        SQLINTEGER totalLength = 0;
        v->sqlcurlen = SQL_NULL_DATA;
        sqlrc = SQLGetData( hstmt, ni, v->sqlctype, buffer, LOB_BUF1_LEN, &totalLength );

        /* SUCCESS_WITH_INFO means there is data in this column (not NULL), and it is 'totalLength' long */ 

    if ( sqlrc == SQL_SUCCESS_WITH_INFO )
      {
        // Another weird thing, SQLGetData never return correct the real data length if we pass a short buffer in previous code. So I have to create a big enough buffer to hold the result.
        //;
     }
     else if ( totalLength > 0 ) 
     {
        v->sqlcurlen = totalLength;
        v->sqldata = sqBufAllocEx( totalLength, tuserp );    ACOPY(v->sqldata, buffer, totalLength);

      }

      // ...; 

     }
  • Thanks for the detailed report!  That's very helpful.

    If you have a similar reproducer for the data-insertion issue, let us know.
  • Hi Adam,

    I can reproduce the data-insertion issue (hung) via some test code (I'm not sure whether it's corrected).

    What I did is,
    0. In ODBC data source configuration, the "AUTO COMMIT" is disabled
    1. Fetch all records from tableA
    2. Insert 50 rows to tableB (using Bind variables)
    3. Fetch all records from tableB
    4. Repeat step 1-3 for 10000 times, and it always hung on step2 in the very beginning.

    I have set the ODBC driver log level to TRACE and will send you a mail (HP mail box if I'm right)  along with the test code.


    Thanks & Best Regards

    Sheng-Yu

  • Hi Sheng-Yu,

    Thanks for the update.  It sounds like this might not reproduce consistently, though?

    One thing that would be useful -- if you do observe the issue again, could you check the "LOCKS" system table?  (Also, is your transaction level SERIALIZABLE?  Vertica's default, and the use case that we're tuned for, is READ-COMMITTED.)  Anything involving a hang and a mix of inserts and updates sounds suspiciously like lock contention to me.  The LOCKS system table will tell you what locks are held and by which running sessions/transactions, including (I believe) background/system transactions, etc.

    The original error ("Unexpected exception has been caught") sounds like a separate issue to me; usually exceptions don't cause hangs in this particular code, they cause errors per your initial post.  But it's possible that they're the same issue.

    Adam
  • Hi Adam,

    Thanks a lot for your reply.

    1. Yes, it cannot be reproduced consistently.
    2. The transaction isolation level is "READ COMMITTED", which is configured in ODBC data source
    3. About original error ("Unexpected exception has been caught"), I cannot reproduce it via test code.

    Is it possible caused by incorrect locale setting or something like code page conversion, I'm a bit lost now.
    Or is it possible caused by incorrect parameter passed into ODBC API for varchar/BLOB fields?

    Do you have sample code for read/write of long varchar/varbinary fields, the sample in the "Programming Guide" seems too simple?

    BR,
    Sheng-Yu

  • Hi Sheng-Yu,

    I don't think it's likely that the hang is caused by an incorrect locale setting.  I could believe that the initial crash was somehow locale-related -- I wouldn't expect Vertica to crash (though never trust any software engineer who claims their code is 100% bug-free :-) ), but many programs do depend on having characters encoded in a particular way, and might break if they receive apparently-corrupt (actually just incorrectly-encoded) data.

    For the hang, I would still encourage you to take a look at the LOCKS system table when you see it happening.  There's much more lock contention in SERIALIZABLE mode, but READ-COMMITTED can still have it under the right circumstances.

    Also, I should probably clear up a misconception:  LONG VARCHAR is not a BLOB type.  It is simply a VARCHAR that is, well, long :-)  Same for LONG VARBINARY.  The difference is subtle, especially with a column-store database.  (A major advantage of BLOBs in a conventional database is that they are intended to be stored separately from your main heap file, so a very large object shouldn't affect scan performance.  But column stores already store each of your columns separately.)  But this does have a variety of implications for performance and usability.  Some positive, some negative; depends a lot on your perspective and use case.

    Adam

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file