Options

how to improve the insert performance

In our application, we try to insert records into Vertica using JDBC. there are about 40 columns in the table.But the performance is really poor. It took 55 seconds to insert 1500 records. Are there any tips or workaround to improve the insert performance?
the following is the table definition and the insert sqls:
CREATE TABLE dws.xxxxx (
       REQUEST_ID Varchar(100),
       SRC_LASTMODDATE Timestamp,
       USER_DATA_01 Varchar(800),
       USER_DATA_02 Varchar(800),
       USER_DATA_03 Varchar(800),
       USER_DATA_04 Varchar(800),
       USER_DATA_05 Varchar(800),
       USER_DATA_06 Varchar(800),
       USER_DATA_07 Varchar(800),
       USER_DATA_08 Varchar(800),
       USER_DATA_09 Varchar(800),
       USER_DATA_10 Varchar(800),
       USER_DATA_11 Varchar(800),
       USER_DATA_12 Varchar(800),
       USER_DATA_13 Varchar(800),
       USER_DATA_14 Varchar(800),
       USER_DATA_15 Varchar(800),
       USER_DATA_16 Varchar(800),
       USER_DATA_17 Varchar(800),
       USER_DATA_18 Varchar(800),
       USER_DATA_19 Varchar(800),
       USER_DATA_20 Varchar(800),
       VISIBLE_USER_DATA_01 Varchar(800),
       VISIBLE_USER_DATA_02 Varchar(800),
       VISIBLE_USER_DATA_03 Varchar(800),
       VISIBLE_USER_DATA_04 Varchar(800),
       VISIBLE_USER_DATA_05 Varchar(800),
       VISIBLE_USER_DATA_06 Varchar(800),
       VISIBLE_USER_DATA_07 Varchar(800),
       VISIBLE_USER_DATA_08 Varchar(800),
       VISIBLE_USER_DATA_09 Varchar(800),
       VISIBLE_USER_DATA_10 Varchar(800),
       VISIBLE_USER_DATA_11 Varchar(800),
       VISIBLE_USER_DATA_12 Varchar(800),
       VISIBLE_USER_DATA_13 Varchar(800),
       VISIBLE_USER_DATA_14 Varchar(800),
       VISIBLE_USER_DATA_15 Varchar(800),
       VISIBLE_USER_DATA_16 Varchar(800),
       VISIBLE_USER_DATA_17 Varchar(800),
       VISIBLE_USER_DATA_18 Varchar(800),
       VISIBLE_USER_DATA_19 Varchar(800),
       VISIBLE_USER_DATA_20 Varchar(800)
);

INSERT INTO dws.xxxxx VALUES('0','2005.02.01 12:00:01','dummy_1306','dummy_720','dummy_717','dummy_713','dummy_516','dummy_1209','dummy_388','dummy_250','dummy_1248','dummy_577','dummy_859','dummy_500','dummy_100','dummy_404','dummy_73','dummy_612','dummy_1096','dummy_1061','dummy_692','dummy_1439','dummy_536','dummy_1240','dummy_446','dummy_85','dummy_217','dummy_771','dummy_296','dummy_55','dummy_838','dummy_445','dummy_134','dummy_1497','dummy_957','dummy_1457','dummy_570','dummy_1060','dummy_1341','dummy_1423','dummy_888','dummy_140');
INSERT INTO dws.xxxxx VALUES('1','2005.02.01 12:00:02','dummy_57','dummy_1005','dummy_204','dummy_498','dummy_280','dummy_1271','dummy_368','dummy_1260','dummy_1029','dummy_460','dummy_1002','dummy_256','dummy_213','dummy_104','dummy_379','dummy_1426','dummy_1022','dummy_915','dummy_1495','dummy_964','dummy_770','dummy_1101','dummy_849','dummy_140','dummy_1095','dummy_85','dummy_1430','dummy_1013','dummy_816','dummy_907','dummy_286','dummy_412','dummy_1449','dummy_1451','dummy_484','dummy_305','dummy_780','dummy_168','dummy_1499','dummy_163');
INSERT INTO dws.xxxxx VALUES('2','2005.02.01 12:00:03','dummy_510','dummy_176','dummy_1310','dummy_404','dummy_65','dummy_906','dummy_152','dummy_925','dummy_1023','dummy_402','dummy_1366','dummy_108','dummy_450','dummy_1040','dummy_170','dummy_200','dummy_1349','dummy_749','dummy_1244','dummy_136','dummy_470','dummy_652','dummy_536','dummy_74','dummy_641','dummy_585','dummy_827','dummy_1116','dummy_1410','dummy_879','dummy_160','dummy_436','dummy_1298','dummy_435','dummy_170','dummy_365','dummy_470','dummy_560','dummy_80','dummy_295');

... ..


Comments

  • Options

    Hm...  Are you literally running the statements as below?

     

    Vertica is very slow to start running an INSERT statement.  If you only load one value with that statement, you will tend to see poor performance.  What you can do, though, is to prepare a single INSERT statement and use it repeatedly to load many values in sequence.  If Vertica is your bottleneck, this should be dramatically faster.

     

    You might also consider taking a look at your schema:  You have lots of VARCHAR(800)'s, but your actual data is much narrower than that.  Vertica generally loads at very roughly a constant number of bytes per second -- this is far from a hard rule, but very wide rows with lots of data per row will of course load more slowly than very narrow rows with very little data per row.  If your data is narrow, you can tell this to Vertica by making your variable-size fields smaller; this allows Vertica to optimize a little more effectively and may give you a performance boost.

     

    If you really need maximum performance for loading lots of data, Vertica's COPY statement (accessible over JDBC via the VerticaCopyStream API) will give you the maximum performance and configurability for large bulk data loads.  But I'd try the above first; they should help a lot as well.

     

    Adam

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file