global catalog lock held too long
Hello!
I'm experiencing degradation in performance with etl processes on vertica cluster.
For the start I have had 10000 DML transactions per hour with my etl. And eventualy I've faced global catalog lock issue. Some of transactions were being cut off for timeout.
So I started with reducing the number. Now I have 5000 DML transactions per hour on 16 node cluster. But I'm still facing issue with catalog lock. For now I figured out that some transactions hung for seconds, tens of seconds and even up minute holding catalog lock:
transation_start | gc_grant_time | gc_release_time | description |
---|---|---|---|
2017-10-23 21:36:44 | 2017-10-23 21:37:38 | 2017-10-23 21:38:23 | Txn: d0000001e938fa 'AnalyzeStats: (Table: SATI.S_ExtUser_Type) |
2017-10-23 21:34:16 | 2017-10-23 21:38:29 | 2017-10-23 21:39:07 | Txn: 110000003edd880 'INSERT /*+ direct */ INTO sati.S_URLRel_UR |
2017-10-23 21:36:46 | 2017-10-23 21:38:29 | 2017-10-23 21:39:07 | Txn: 160000001e77d94 'INSERT /*+ direct */ INTO sati.S_ExtItem_D |
2017-10-23 19:32:48 | 2017-10-23 19:32:58 | 2017-10-23 19:33:29 | Txn: 140000001ed1424 'COPY SAEM.log_ads_auction (log_id,rnk,bid, |
2017-10-23 21:36:44 | 2017-10-23 21:37:38 | 2017-10-23 21:38:08 | Txn: 190000000167fbd 'INSERT /*+ direct */ INTO sati.S_ExtItem_L |
2017-10-23 21:31:10 | 2017-10-23 21:37:38 | 2017-10-23 21:38:08 | Txn: 170000001f0bde0 'INSERT /*+ direct */ INTO SALM.v_ads_recom |
2017-10-23 21:36:44 | 2017-10-23 21:37:38 | 2017-10-23 21:38:07 | Txn: 110000003edd8d0 'INSERT /*+ direct */ INTO sati.S_ExtItem_L |
2017-10-23 21:32:08 | 2017-10-23 21:37:38 | 2017-10-23 21:38:07 | Txn: e000000249e5da 'insert /+direct,syntactic_join,verbatim/ |
2017-10-23 21:36:39 | 2017-10-23 21:37:38 | 2017-10-23 21:38:07 | Txn: 1800000001ee432 'INSERT /*+ direct */ INTO sati.S_ExtItem_I |
2017-10-23 21:36:42 | 2017-10-23 21:37:38 | 2017-10-23 21:38:07 | Txn: 190000000167fbb 'INSERT /*+ direct */ INTO sati.S_ExtItem_I |
2017-10-23 21:32:12 | 2017-10-23 21:37:38 | 2017-10-23 21:38:07 | Txn: b0000002938a87 'insert /+direct,syntactic_join,verbatim/ |
2017-10-23 20:34:39 | 2017-10-23 20:35:11 | 2017-10-23 20:35:37 | Txn: 190000000166f4e 'drop_partition on table sae.delivery_order |
2017-10-23 20:35:03 | 2017-10-23 20:35:28 | 2017-10-23 20:35:52 | Txn: 190000000166f58 'drop_partition on table sae.delivery_order |
2017-10-23 21:37:00 | 2017-10-23 21:37:21 | 2017-10-23 21:37:45 | Txn: 130000001f0b676 'analyze_row_count' |
2017-10-23 20:34:50 | 2017-10-23 20:37:19 | 2017-10-23 20:37:42 | Txn: d0000001e92b14 'COPY SAEM.log_items_search (log_id,ip,cooki |
2017-10-23 21:06:23 | 2017-10-23 21:06:47 | 2017-10-23 21:07:10 | Txn: 110000003edd2e0 'Truncate table salx.msfo_scenarios;' |
2017-10-23 20:36:32 | 2017-10-23 20:37:06 | 2017-10-23 20:37:29 | Txn: 150000001ed32bf 'CREATE TABLE tmp.saed_sales_report_2017102 |
2017-10-23 20:36:32 | 2017-10-23 20:37:43 | 2017-10-23 20:38:05 | Txn: 150000001ed32bb 'CREATE TABLE tmp.saed_event_survey_2017102 |
2017-10-23 20:36:31 | 2017-10-23 20:37:21 | 2017-10-23 20:37:43 | Txn: 150000001ed32b3 'CREATE TABLE tmp.saed_event_category_20171 |
2017-10-23 20:34:27 | 2017-10-23 20:34:41 | 2017-10-23 20:35:02 | Txn: 190000000166f3c 'drop_partition on table sae.delivery_clien |
2017-10-23 21:06:18 | 2017-10-23 21:06:24 | 2017-10-23 21:06:45 | Txn: a0000005c1a773 'Truncate table salx.msfo_managment_lines;' |
2017-10-23 20:34:27 | 2017-10-23 20:34:50 | 2017-10-23 20:35:10 | Txn: 190000000166f43 'drop_partition on table sae.delivery_order |
2017-10-23 21:37:16 | 2017-10-23 21:38:08 | 2017-10-23 21:38:29 | Txn: 150000001ed48e4 'analyze_row_count' |
2017-10-23 20:06:59 | 2017-10-23 20:07:04 | 2017-10-23 20:07:24 | Txn: 170000001f0a174 'truncate table satl.S_ClickCampaign_Title' |
It happens from time to time while number of transactions does not change dramatically.
So can anybody point out if there is any profiling for catalog locks or best practices.
I know how badly locks managed in postgresql where process wanting more than one lock (2 for example) can pend like forever with first lock taken and waiting for second lock stalling all other processes waiting for first lock. Is it valid for vertica?