The Vertica Forum recently got a makeover! Let us know what you think by filling out this short, anonymous survey.

crash in ResegmentMerge after upgrade to 9.2

Hello!

I'm facing SIGSEGV on some queries after an upgrade from 8.1.1-4 to 9.2.0-7

Queries:

select * 
from( 
    select lu.AdmUser_id, ia.IsActual, l.AdmUserGroupLink_id, 
        min(ia.Actual_date) over(partition by ia.AdmUserGroupLink_id) as valid_from ,
        lead(ia.Actual_date, 1, null) over(partition by ia.AdmUserGroupLink_id order by ia.Actual_date) as valid_to 
    from DDS.H_AdmUserGroupLink l 
    join DDS.L_AdmUserGroupLink_AdmUser lu on lu.AdmUserGroupLink_id = l.AdmUserGroupLink_id 
    join DDS.S_AdmUserGroupLink_IsActual ia on ia.AdmUserGroupLink_id = l.AdmUserGroupLink_id 
)t 
where admuser_Id = 124954500001 
limit 50;

select *, lu.AdmUser_id, ia.IsActual, 
    row_number() over(partition by l.AdmUserGroupLink_id) rn, 
    lead(lu.AdmUser_id) over(partition by l.AdmUserGroupLink_id order by true) valid_to
from DDS.H_AdmUserGroupLink l join DDS.L_AdmUserGroupLink_AdmUser lu on lu.AdmUserGroupLink_id = l.AdmUserGroupLink_id 
join DDS.S_AdmUserGroupLink_IsActual ia on ia.AdmUserGroupLink_id = l.AdmUserGroupLink_id 
where ia.IsActual is not null 
order by lu.AdmUser_id desc 
limit 100;

DDL is in attachment

I have already recreated and refilled these tables and it did not help.
Each time I run it I get SIGSEGV on every vertica node.
Backtrace is in attachments.
Looks like the problem within StorageMerge in a ResegmentMerge::performResegmen call:

(_ZN2EE14ResegmentMerge16performResegmentEPNS_11MergeBufferEiiRNS_14TupleStructureERSt6vectorIPNS_13SegTempBufferESaIS7_EEiiPNS_5VEvalERNS_11VEvalParamsE+0x551) [0x130fa81]

Can you help me with fixing this?

Comments

  • VertiguyVertiguy Administrator

    What was your upgrade path?

  • vertica_8.1.1-4_amd64.deb (initial)
    vertica_9.0.1-20_amd64.deb
    vertica_9.1.1-8_amd64.deb
    vertica_9.2.0-7_amd64.deb

  • Jim_KnicelyJim_Knicely Administrator
    edited June 5

    @phil2 - Are you running on Debian Linux Versions 8.5 or 8.9?

    I tried your examples on Vertica 9.2.1-1 on RHEL and did not have an issue. Althouogh, I did not have any data.

  • cat /etc/debian_version
    8.6
    uname -a
    Linux avi-dwh15 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux
    
  • LenoyJLenoyJ Employee
    edited June 5
    Just checking if you started up the DB after every package install?
  • Jim_KnicelyJim_Knicely Administrator

    @phil2 - You are running Vertica 9.2.0-7 on an unsupported version of Debian Linux.

    See:
    https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SupportedPlatforms/MCandServer.htm

  • @LenoyJ Yes, I did start the DB after each upgrade step
    @Jim_Knicely So you won't help us?

  • SruthiASruthiA Employee

    I tried without loading data on 9.2.0-7. it did not crash. I will try by using the data you provided.

    dbadmin=> select version();

    version

    Vertica Analytic Database v9.2.0-7
    (1 row)

    dbadmin=>

    dbadmin=> select *
    dbadmin-> from(
    dbadmin(> select lu.AdmUser_id, ia.IsActual, l.AdmUserGroupLink_id,
    dbadmin(> min(ia.Actual_date) over(partition by ia.AdmUserGroupLink_id) as valid_from ,
    dbadmin(> lead(ia.Actual_date, 1, null) over(partition by ia.AdmUserGroupLink_id order by ia.Actual_date) as valid_to
    dbadmin(> from DDS.H_AdmUserGroupLink l
    dbadmin(> join DDS.L_AdmUserGroupLink_AdmUser lu on lu.AdmUserGroupLink_id = l.AdmUserGroupLink_id
    dbadmin(> join DDS.S_AdmUserGroupLink_IsActual ia on ia.AdmUserGroupLink_id = l.AdmUserGroupLink_id
    dbadmin(> )t
    dbadmin-> where admuser_Id = 124954500001
    dbadmin-> limit 50;
    AdmUser_id | IsActual | AdmUserGroupLink_id | valid_from | valid_to
    ------------+----------+---------------------+------------+----------
    (0 rows)

    dbadmin=>
    dbadmin=> select *, lu.AdmUser_id, ia.IsActual,
    dbadmin-> row_number() over(partition by l.AdmUserGroupLink_id) rn,
    dbadmin-> lead(lu.AdmUser_id) over(partition by l.AdmUserGroupLink_id order by true) valid_to
    dbadmin-> from DDS.H_AdmUserGroupLink l join DDS.L_AdmUserGroupLink_AdmUser lu on lu.AdmUserGroupLink_id = l.AdmUserGroupLink_id
    dbadmin-> join DDS.S_AdmUserGroupLink_IsActual ia on ia.AdmUserGroupLink_id = l.AdmUserGroupLink_id
    dbadmin-> where ia.IsActual is not null
    dbadmin-> order by lu.AdmUser_id desc
    dbadmin-> limit 100;
    AdmUserGroupLink_id | External_ID | LAUNCH_ID | Source_system_id | AdmUserGroupLink_id | AdmUser_id | LAUNCH_ID | Source_system_id | AdmUserGroupLink_id | IsActual | Actual_date | Load_date | Source_system_id | LAUNCH_ID | AdmUser_id | IsActual | rn | valid_to
    ---------------------+-------------+-----------+------------------+---------------------+------------+-----------+------------------+---------------------+----------+-------------+-----------+------------------+-----------+------------+----------+----+----------
    (0 rows)

  • @SruthiA you need to remove identity column from ddl in order to load the data from csv. We managed to reproduce the crash on a newly installed single node cluster with these data running 9.2.0-7
  • SruthiASruthiA Employee
    edited June 7

    Hi Phil,

    Yes, I removed it. So far loaded data into H_AdmUserGroupLink. Can you share me the count of rows for each table. Does count of rows matter?

    Sruthi

  • Jim_KnicelyJim_Knicely Administrator

    Just tested in Vertica 9.2.1-1 and the second query causes a DB crash!

    dbadmin=> SELECT COUNT(*) FROM dds.H_AdmUserGroupLink;
     COUNT
    -------
      7676
    (1 row)
    
    dbadmin=> SELECT COUNT(*) FROM dds.L_AdmUserGroupLink_AdmUser;
     COUNT
    -------
      7676
    (1 row)
    
    dbadmin=> SELECT COUNT(*) FROM dds.S_AdmUserGroupLink_IsActual;
     COUNT
    -------
     24407
    (1 row)
    
    dbadmin=> select *
    dbadmin-> from(
    dbadmin(>     select lu.AdmUser_id, ia.IsActual, l.AdmUserGroupLink_id,
    dbadmin(>         min(ia.Actual_date) over(partition by ia.AdmUserGroupLink_id) as valid_from ,
    dbadmin(>         lead(ia.Actual_date, 1, null) over(partition by ia.AdmUserGroupLink_id order by ia.Actual_date) as valid_to
    dbadmin(>     from DDS.H_AdmUserGroupLink l
    dbadmin(>     join DDS.L_AdmUserGroupLink_AdmUser lu on lu.AdmUserGroupLink_id = l.AdmUserGroupLink_id
    dbadmin(>     join DDS.S_AdmUserGroupLink_IsActual ia on ia.AdmUserGroupLink_id = l.AdmUserGroupLink_id
    dbadmin(> )t
    dbadmin-> where admuser_Id = 124954500001
    dbadmin-> limit 50;
      AdmUser_id  | IsActual | AdmUserGroupLink_id |         valid_from         | valid_to
    --------------+----------+---------------------+----------------------------+----------
     124954500001 | t        |         23409000001 | 2019-04-04 02:03:28.320633 |
    (1 row)
    
    dbadmin=> select *, lu.AdmUser_id, ia.IsActual,
    dbadmin->     row_number() over(partition by l.AdmUserGroupLink_id) rn,
    dbadmin->     lead(lu.AdmUser_id) over(partition by l.AdmUserGroupLink_id order by true) valid_to
    dbadmin-> from DDS.H_AdmUserGroupLink l join DDS.L_AdmUserGroupLink_AdmUser lu on lu.AdmUserGroupLink_id = l.AdmUserGroupLink_id
    dbadmin-> join DDS.S_AdmUserGroupLink_IsActual ia on ia.AdmUserGroupLink_id = l.AdmUserGroupLink_id
    dbadmin-> where ia.IsActual is not null
    dbadmin-> order by lu.AdmUser_id desc
    dbadmin-> limit 100;
    server closed the connection unexpectedly
            This probably means the server terminated abnormally
            before or while processing the request.
    The connection to the server was lost. Attempting reset: Failed.
    !>
    

    @phil2 - Are you a client? Can you open a case? I will open a JIRA.

  • SruthiASruthiA Employee
    edited June 7

    Hi Jim,

    There is a support case for this issue logged by client today SD02484487

    Sruthi

  • Yeap, It was me

  • @Jim_Knicely can you describe what you did to reproduce crash in a support ticket and open a jira issue. Nina have difficulties with reproducing it.
  • Jim_KnicelyJim_Knicely Administrator

    @phil2 - I reproduced your issue by simply using your DDL and data. This issue has been identified and engineering is working on a fix. I will keep this thread open with updates!

    If you email your customer SAID I can view the case and update support with the JIRA info.

    My email address is [email protected]

    Thanks!

  • Hi Jim, we are also experiencing this issue. We have a ticket opened with Vertica support. Is there any ETA on a fix?

  • Jim_KnicelyJim_Knicely Administrator
    edited August 16

    Sorry, I should have updated this thread earlier! The issue was resolved in 9.2.1-4.

    See:
    https://www.vertica.com/docs/ReleaseNotes/9.2.x/Vertica_9.2.x_Release_Notes.htm

    In particular:
    Issue VER-68156 - Certain combinations of analytic functions applied to the output of a merge join would occasionally cause Vertica to crash. This issue has been fixed.

    I retried the original query in this thread that caused the DB to crash, and it works fine now!

    dbadmin=> SELECT version();
                  version
    ------------------------------------
     Vertica Analytic Database v9.2.1-4
    (1 row)
    
    dbadmin=> select *, lu.AdmUser_id, ia.IsActual,
    dbadmin->     row_number() over(partition by l.AdmUserGroupLink_id) rn,
    dbadmin->     lead(lu.AdmUser_id) over(partition by l.AdmUserGroupLink_id order by true) valid_to
    dbadmin-> from DDS.H_AdmUserGroupLink l join DDS.L_AdmUserGroupLink_AdmUser lu on lu.AdmUserGroupLink_id = l.AdmUserGroupLink_id
    dbadmin-> join DDS.S_AdmUserGroupLink_IsActual ia on ia.AdmUserGroupLink_id = l.AdmUserGroupLink_id
    dbadmin-> where ia.IsActual is not null
    dbadmin-> order by lu.AdmUser_id desc
    dbadmin-> limit 1;
     AdmUserGroupLink_id |     External_ID     | LAUNCH_ID | Source_system_id | AdmUserGroupLink_id |  AdmUser_id  | LAUNCH_ID | Source_system_id | AdmUserGroupLink_id | IsActual |        Actual_date         |         Load_date          | Source_system_id | LAUNCH_ID |  AdmUser_id  | IsActual | rn | valid_to
    ---------------------+---------------------+-----------+------------------+---------------------+--------------+-----------+------------------+---------------------+----------+----------------------------+----------------------------+------------------+-----------+--------------+----------+----+----------
             24429000001 | 8937546938320497366 |   3880948 |                1 |         24429000001 | 129441500001 |   3880948 |                1 |         24429000001 | t        | 2019-06-06 01:31:06.664558 | 2019-06-06 01:31:06.664558 |                1 |   3880948 | 129441500001 | t        |  1 |
    (1 row)
    
  • Thank you

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.