Query failing on a cluster with 2 nodes down

Hi, I have a cluster of 5 nodes (v6.1.2), with 2 nodes currently down. Elastic_cluster is enabled, all my design is ksafe=2. I would thus expect to be able to carry on working as usual for whichever 2 nodes down. I now have queries failing with the following error:
  ERROR: Insufficient projections to answer query   Detail: Path segment called for on node v_spil_dwh_node0003 can not be computed  
node3 is indeed one of the nodes down. I checked all the projections of the tables referenced in the query, they all are ksafe=2, I can select from all of them. There is nothing relevant I could find in the documentation. What could be the cause (and solution) of this? Thanks, Guillaume

Comments

  • Hi, Can you run the below queries and check if there is any critical node which is down to serve the query result? select * from nodes; select * from critical_nodes; select get_node_dependencies();
  • Hi, Thanks for your answer. In the meantime I did restart the nodes, so the answers to your query might not be relevant any more, but the problem stays, and this is not the first time I see it. This means I currently cannot trust vertica to run with one or 2 nodes down. Select * from critical_nodes is now empty (I wish I had known this one before, though).
      pil_dwh=> select get_node_dependencies();                                                                            get_node_dependencies  -------------------------------------------------------------------------------------------------------------------------------------------------------------------------   Deps:  00111 - cnt: 679  01011 - cnt: 4  01101 - cnt: 3  01110 - cnt: 672  10011 - cnt: 675  10101 - cnt: 4  11001 - cnt: 672  11010 - cnt: 7  11100 - cnt: 679  11111 - cnt: 227  (1 row)  
    Would you mind explain what this result means? there is nothing in the doc about it. At the time, nodes 3 and 5 were down. Thanks,
  • Hi Guillaume, Could you please send me query and export the objects in that query. Regards, Bhawana
  • Sure:
      INSERT  INTO      dcl.wc_event      (          realm_id ,          campaign_id ,          list_id ,          email ,          event_dts ,          event_type_id ,          event_cnt      )      at epoch latest  SELECT      dcl_wcr.realm_id,      stc_wc.campaign_id,      stc_wc.list_id,      stc_wc.email,      stc_wc.event_date AS event_dts,      stc_wc.event_type AS event_type_id,      COUNT(*)          AS event_cnt  FROM      stg.wc_hourly_report_tracked_events stc_wc  JOIN      dcl.wc_realm dcl_wcr  ON      stc_wc.realm = dcl_wcr.realm_name  WHERE      stc_wc.batch_source_path IN      (          SELECT              batch_source_path          FROM              temp_insert_wc )  GROUP BY      dcl_wcr.realm_id,      stc_wc.campaign_id,      stc_wc.list_id,      stc_wc.email,      stc_wc.event_date,      stc_wc.event_type    ERROR: Insufficient projections to answer query   Detail: Path segment called for on node v_spil_dwh_node0003 can not be computed    spil_dwh=> select export_objects('', 'dcl.wc_event');    CREATE TABLE dcl.wc_event  (      realm_id int,      campaign_id int,      list_id int,      email varchar(255),      event_dts timestamp,      event_type_id int,      event_cnt int  );        CREATE PROJECTION dcl.wc_event /*+createtype(L)*/  (   realm_id,   campaign_id,   list_id,   email,   event_dts,   event_type_id,   event_cnt  )  AS   SELECT wc_event.realm_id,          wc_event.campaign_id,          wc_event.list_id,          wc_event.email,          wc_event.event_dts,          wc_event.event_type_id,          wc_event.event_cnt   FROM dcl.wc_event   ORDER BY wc_event.realm_id,            wc_event.campaign_id,            wc_event.list_id,            wc_event.email,            wc_event.event_dts,            wc_event.event_type_id  SEGMENTED BY hash(wc_event.realm_id, wc_event.campaign_id, wc_event.list_id, wc_event.email, wc_event.event_dts, wc_event.event_type_id) ALL NODES KSAFE 2;      SELECT MARK_DESIGN_KSAFE(2);    (1 row)    spil_dwh=> select export_objects('', 'stg.wc_hourly_report_tracked_events');    CREATE TABLE stg.wc_hourly_report_tracked_events  (      batch_source_name varchar(255),      batch_source_path varchar(255),      batch_import_timestamp timestamp,      realm varchar(100),      campaign_id int,      campaign_alias varchar(500),      list_id int,      list_name varchar(500),      subscriber_id int,      email varchar(500),      first_name varchar(500),      last_name varchar(500),      event_date timestamp,      event_type int  );        CREATE PROJECTION stg.wc_hourly_report_tracked_events /*+createtype(L)*/  (   batch_source_name,   batch_source_path,   batch_import_timestamp,   realm,   campaign_id,   campaign_alias,   list_id,   list_name,   subscriber_id,   email,   first_name,   last_name,   event_date,   event_type  )  AS   SELECT wc_hourly_report_tracked_events.batch_source_name,          wc_hourly_report_tracked_events.batch_source_path,          wc_hourly_report_tracked_events.batch_import_timestamp,          wc_hourly_report_tracked_events.realm,          wc_hourly_report_tracked_events.campaign_id,          wc_hourly_report_tracked_events.campaign_alias,          wc_hourly_report_tracked_events.list_id,          wc_hourly_report_tracked_events.list_name,          wc_hourly_report_tracked_events.subscriber_id,          wc_hourly_report_tracked_events.email,          wc_hourly_report_tracked_events.first_name,          wc_hourly_report_tracked_events.last_name,          wc_hourly_report_tracked_events.event_date,          wc_hourly_report_tracked_events.event_type   FROM stg.wc_hourly_report_tracked_events   ORDER BY wc_hourly_report_tracked_events.batch_source_name,            wc_hourly_report_tracked_events.batch_source_path,            wc_hourly_report_tracked_events.batch_import_timestamp,            wc_hourly_report_tracked_events.realm,            wc_hourly_report_tracked_events.campaign_id,            wc_hourly_report_tracked_events.campaign_alias,            wc_hourly_report_tracked_events.list_id,            wc_hourly_report_tracked_events.list_name,            wc_hourly_report_tracked_events.subscriber_id,            wc_hourly_report_tracked_events.email,            wc_hourly_report_tracked_events.first_name,            wc_hourly_report_tracked_events.last_name,            wc_hourly_report_tracked_events.event_date,            wc_hourly_report_tracked_events.event_type  SEGMENTED BY hash(wc_hourly_report_tracked_events.batch_import_timestamp, wc_hourly_report_tracked_events.campaign_id, wc_hourly_report_tracked_events.list_id, wc_hourly_report_tracked_events.subscriber_id, wc_hourly_report_tracked_events.event_date, wc_hourly_report_tracked_events.event_type, wc_hourly_report_tracked_events.realm, wc_hourly_report_tracked_events.batch_source_name, wc_hourly_report_tracked_events.batch_source_path, wc_hourly_report_tracked_events.campaign_alias, wc_hourly_report_tracked_events.list_name, wc_hourly_report_tracked_events.email, wc_hourly_report_tracked_events.first_name, wc_hourly_report_tracked_events.last_name) ALL NODES KSAFE 2;      SELECT MARK_DESIGN_KSAFE(2);    (1 row)    spil_dwh=> select export_objects('', 'dcl.wc_realm');      CREATE TABLE dcl.wc_realm  (      realm_id int NOT NULL,      realm_name varchar(255),      realm_desc varchar(255),      password varchar(25),      channelid int  );    ALTER TABLE dcl.wc_realm ADD CONSTRAINT pk_dcl_wc_realm PRIMARY KEY (realm_id);      CREATE PROJECTION dcl.wc_realm_DBD_200_rep_initial_initial /*+createtype(D)*/  (   realm_id,   realm_name,   realm_desc,   password,   channelid  )  AS   SELECT wc_realm.realm_id,          wc_realm.realm_name,          wc_realm.realm_desc,          wc_realm.password,          wc_realm.channelid   FROM dcl.wc_realm   ORDER BY wc_realm.realm_id  UNSEGMENTED ALL NODES;      SELECT MARK_DESIGN_KSAFE(2);    (1 row)    
    The last table being a temporary table, I cannot export it. This could well be the reason... I am still very curious about the signification of get_node_dependencies(); Thanks,
  • Hi, Please make sure that all the projections involved in the query are up to date. There is a column in the projection 'is_up_to_date'. And also check the projection on that temp table. Regards, Bhawana
  • Hi, I will look into this, Could you explain to me the meaning of get_node_dependencies() ? Thanks,
  • Hi, I juts checked, and can confirm that all my projections are up to date:
      select projection_schema, projection_name FROM projections WHERE not is_up_to_date;      projection_schema | projection_name  -------------------+-----------------  (0 rows)  
  • Hi, Node dependencies tells us the other nodes on which a specific node is dependent for k-safety. Traditionally, this obtained by running get_node_dependencies() Regards, Bhawana
  • Thanks for this. In that case i am a bit surprised by the results. It looks like some nodes are very dependent on each other (high count) and some almost not dependent but still a bit (count of ~4) ? Thanks,
  • Hi, Looking at your get_node_dependencies() , it looks like the data are not distributed to all the nodes correctly. 00111 - cnt: 679 01011 - cnt: 4 ( but here for 3 nodes it has only 4 projections) 01101 - cnt: 3( same here , it has only 3 projections) This indicate that the data are not distributed equally. Did you add the new node recently? if yes did you do the rebalance after that.If rebalance doesn't complete successfully, node dependencies will be messed up If the nodes are still down please bring that up and do the the Re-balance data from the admintool and run this get_node_dependencies() command again. Regards, Bhawana
  • Thanks a lot for this, Bhawana. Weirdly enough, I added all those nodes at the same time. I will rebalance to try to fix this. What would be the expected result of get_node_dependencies()? Thanks,
  • Hi, The data should be distributed equally on all nodes. You can check it after you are done with the rebalance. Run this :select get_node_dependencies(); and we will check the result. Regards, Bhawana
  • Hi Bhawana, I ran the rebalance, and here are the results of get_node_dependencies(): Deps: 00111 - cnt: 698 01011 - cnt: 4 01101 - cnt: 3 01110 - cnt: 691 10011 - cnt: 694 10101 - cnt: 4 11001 - cnt: 691 11010 - cnt: 7 11100 - cnt: 698 11111 - cnt: 227 I still do not understand what I am suppose to find. Would you mind explaining it? Thanks,

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file