crash in ResegmentMerge after upgrade to 9.2

Hello!

I'm facing SIGSEGV on some queries after an upgrade from 8.1.1-4 to 9.2.0-7

Queries:

select * 
from( 
    select lu.AdmUser_id, ia.IsActual, l.AdmUserGroupLink_id, 
        min(ia.Actual_date) over(partition by ia.AdmUserGroupLink_id) as valid_from ,
        lead(ia.Actual_date, 1, null) over(partition by ia.AdmUserGroupLink_id order by ia.Actual_date) as valid_to 
    from DDS.H_AdmUserGroupLink l 
    join DDS.L_AdmUserGroupLink_AdmUser lu on lu.AdmUserGroupLink_id = l.AdmUserGroupLink_id 
    join DDS.S_AdmUserGroupLink_IsActual ia on ia.AdmUserGroupLink_id = l.AdmUserGroupLink_id 
)t 
where admuser_Id = 124954500001 
limit 50;

select *, lu.AdmUser_id, ia.IsActual, 
    row_number() over(partition by l.AdmUserGroupLink_id) rn, 
    lead(lu.AdmUser_id) over(partition by l.AdmUserGroupLink_id order by true) valid_to
from DDS.H_AdmUserGroupLink l join DDS.L_AdmUserGroupLink_AdmUser lu on lu.AdmUserGroupLink_id = l.AdmUserGroupLink_id 
join DDS.S_AdmUserGroupLink_IsActual ia on ia.AdmUserGroupLink_id = l.AdmUserGroupLink_id 
where ia.IsActual is not null 
order by lu.AdmUser_id desc 
limit 100;

DDL is in attachment

I have already recreated and refilled these tables and it did not help.
Each time I run it I get SIGSEGV on every vertica node.
Backtrace is in attachments.
Looks like the problem within StorageMerge in a ResegmentMerge::performResegmen call:

(_ZN2EE14ResegmentMerge16performResegmentEPNS_11MergeBufferEiiRNS_14TupleStructureERSt6vectorIPNS_13SegTempBufferESaIS7_EEiiPNS_5VEvalERNS_11VEvalParamsE+0x551) [0x130fa81]

Can you help me with fixing this?

Comments

  • VertiguyVertiguy Administrator

    What was your upgrade path?

  • vertica_8.1.1-4_amd64.deb (initial)
    vertica_9.0.1-20_amd64.deb
    vertica_9.1.1-8_amd64.deb
    vertica_9.2.0-7_amd64.deb

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited June 2019

    @phil2 - Are you running on Debian Linux Versions 8.5 or 8.9?

    I tried your examples on Vertica 9.2.1-1 on RHEL and did not have an issue. Althouogh, I did not have any data.

  • cat /etc/debian_version
    8.6
    uname -a
    Linux avi-dwh15 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux
    
  • LenoyJLenoyJ - Select Field - Employee
    edited June 2019
    Just checking if you started up the DB after every package install?
  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    @phil2 - You are running Vertica 9.2.0-7 on an unsupported version of Debian Linux.

    See:
    https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SupportedPlatforms/MCandServer.htm

  • @LenoyJ Yes, I did start the DB after each upgrade step
    @Jim_Knicely So you won't help us?

  • SruthiASruthiA Administrator

    I tried without loading data on 9.2.0-7. it did not crash. I will try by using the data you provided.

    dbadmin=> select version();

    version

    Vertica Analytic Database v9.2.0-7
    (1 row)

    dbadmin=>

    dbadmin=> select *
    dbadmin-> from(
    dbadmin(> select lu.AdmUser_id, ia.IsActual, l.AdmUserGroupLink_id,
    dbadmin(> min(ia.Actual_date) over(partition by ia.AdmUserGroupLink_id) as valid_from ,
    dbadmin(> lead(ia.Actual_date, 1, null) over(partition by ia.AdmUserGroupLink_id order by ia.Actual_date) as valid_to
    dbadmin(> from DDS.H_AdmUserGroupLink l
    dbadmin(> join DDS.L_AdmUserGroupLink_AdmUser lu on lu.AdmUserGroupLink_id = l.AdmUserGroupLink_id
    dbadmin(> join DDS.S_AdmUserGroupLink_IsActual ia on ia.AdmUserGroupLink_id = l.AdmUserGroupLink_id
    dbadmin(> )t
    dbadmin-> where admuser_Id = 124954500001
    dbadmin-> limit 50;
    AdmUser_id | IsActual | AdmUserGroupLink_id | valid_from | valid_to
    ------------+----------+---------------------+------------+----------
    (0 rows)

    dbadmin=>
    dbadmin=> select *, lu.AdmUser_id, ia.IsActual,
    dbadmin-> row_number() over(partition by l.AdmUserGroupLink_id) rn,
    dbadmin-> lead(lu.AdmUser_id) over(partition by l.AdmUserGroupLink_id order by true) valid_to
    dbadmin-> from DDS.H_AdmUserGroupLink l join DDS.L_AdmUserGroupLink_AdmUser lu on lu.AdmUserGroupLink_id = l.AdmUserGroupLink_id
    dbadmin-> join DDS.S_AdmUserGroupLink_IsActual ia on ia.AdmUserGroupLink_id = l.AdmUserGroupLink_id
    dbadmin-> where ia.IsActual is not null
    dbadmin-> order by lu.AdmUser_id desc
    dbadmin-> limit 100;
    AdmUserGroupLink_id | External_ID | LAUNCH_ID | Source_system_id | AdmUserGroupLink_id | AdmUser_id | LAUNCH_ID | Source_system_id | AdmUserGroupLink_id | IsActual | Actual_date | Load_date | Source_system_id | LAUNCH_ID | AdmUser_id | IsActual | rn | valid_to
    ---------------------+-------------+-----------+------------------+---------------------+------------+-----------+------------------+---------------------+----------+-------------+-----------+------------------+-----------+------------+----------+----+----------
    (0 rows)

  • @SruthiA you need to remove identity column from ddl in order to load the data from csv. We managed to reproduce the crash on a newly installed single node cluster with these data running 9.2.0-7
  • SruthiASruthiA Administrator
    edited June 2019

    Hi Phil,

    Yes, I removed it. So far loaded data into H_AdmUserGroupLink. Can you share me the count of rows for each table. Does count of rows matter?

    Sruthi

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    Just tested in Vertica 9.2.1-1 and the second query causes a DB crash!

    dbadmin=> SELECT COUNT(*) FROM dds.H_AdmUserGroupLink;
     COUNT
    -------
      7676
    (1 row)
    
    dbadmin=> SELECT COUNT(*) FROM dds.L_AdmUserGroupLink_AdmUser;
     COUNT
    -------
      7676
    (1 row)
    
    dbadmin=> SELECT COUNT(*) FROM dds.S_AdmUserGroupLink_IsActual;
     COUNT
    -------
     24407
    (1 row)
    
    dbadmin=> select *
    dbadmin-> from(
    dbadmin(>     select lu.AdmUser_id, ia.IsActual, l.AdmUserGroupLink_id,
    dbadmin(>         min(ia.Actual_date) over(partition by ia.AdmUserGroupLink_id) as valid_from ,
    dbadmin(>         lead(ia.Actual_date, 1, null) over(partition by ia.AdmUserGroupLink_id order by ia.Actual_date) as valid_to
    dbadmin(>     from DDS.H_AdmUserGroupLink l
    dbadmin(>     join DDS.L_AdmUserGroupLink_AdmUser lu on lu.AdmUserGroupLink_id = l.AdmUserGroupLink_id
    dbadmin(>     join DDS.S_AdmUserGroupLink_IsActual ia on ia.AdmUserGroupLink_id = l.AdmUserGroupLink_id
    dbadmin(> )t
    dbadmin-> where admuser_Id = 124954500001
    dbadmin-> limit 50;
      AdmUser_id  | IsActual | AdmUserGroupLink_id |         valid_from         | valid_to
    --------------+----------+---------------------+----------------------------+----------
     124954500001 | t        |         23409000001 | 2019-04-04 02:03:28.320633 |
    (1 row)
    
    dbadmin=> select *, lu.AdmUser_id, ia.IsActual,
    dbadmin->     row_number() over(partition by l.AdmUserGroupLink_id) rn,
    dbadmin->     lead(lu.AdmUser_id) over(partition by l.AdmUserGroupLink_id order by true) valid_to
    dbadmin-> from DDS.H_AdmUserGroupLink l join DDS.L_AdmUserGroupLink_AdmUser lu on lu.AdmUserGroupLink_id = l.AdmUserGroupLink_id
    dbadmin-> join DDS.S_AdmUserGroupLink_IsActual ia on ia.AdmUserGroupLink_id = l.AdmUserGroupLink_id
    dbadmin-> where ia.IsActual is not null
    dbadmin-> order by lu.AdmUser_id desc
    dbadmin-> limit 100;
    server closed the connection unexpectedly
            This probably means the server terminated abnormally
            before or while processing the request.
    The connection to the server was lost. Attempting reset: Failed.
    !>
    

    @phil2 - Are you a client? Can you open a case? I will open a JIRA.

  • SruthiASruthiA Administrator
    edited June 2019

    Hi Jim,

    There is a support case for this issue logged by client today SD02484487

    Sruthi

  • Yeap, It was me

  • @Jim_Knicely can you describe what you did to reproduce crash in a support ticket and open a jira issue. Nina have difficulties with reproducing it.
  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    @phil2 - I reproduced your issue by simply using your DDL and data. This issue has been identified and engineering is working on a fix. I will keep this thread open with updates!

    If you email your customer SAID I can view the case and update support with the JIRA info.

    My email address is james.knicely@microfocus.com

    Thanks!

  • andrewringandrewring Vertica Customer

    Hi Jim, we are also experiencing this issue. We have a ticket opened with Vertica support. Is there any ETA on a fix?

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited August 2019

    Sorry, I should have updated this thread earlier! The issue was resolved in 9.2.1-4.

    See:
    https://www.vertica.com/docs/ReleaseNotes/9.2.x/Vertica_9.2.x_Release_Notes.htm

    In particular:
    Issue VER-68156 - Certain combinations of analytic functions applied to the output of a merge join would occasionally cause Vertica to crash. This issue has been fixed.

    I retried the original query in this thread that caused the DB to crash, and it works fine now!

    dbadmin=> SELECT version();
                  version
    ------------------------------------
     Vertica Analytic Database v9.2.1-4
    (1 row)
    
    dbadmin=> select *, lu.AdmUser_id, ia.IsActual,
    dbadmin->     row_number() over(partition by l.AdmUserGroupLink_id) rn,
    dbadmin->     lead(lu.AdmUser_id) over(partition by l.AdmUserGroupLink_id order by true) valid_to
    dbadmin-> from DDS.H_AdmUserGroupLink l join DDS.L_AdmUserGroupLink_AdmUser lu on lu.AdmUserGroupLink_id = l.AdmUserGroupLink_id
    dbadmin-> join DDS.S_AdmUserGroupLink_IsActual ia on ia.AdmUserGroupLink_id = l.AdmUserGroupLink_id
    dbadmin-> where ia.IsActual is not null
    dbadmin-> order by lu.AdmUser_id desc
    dbadmin-> limit 1;
     AdmUserGroupLink_id |     External_ID     | LAUNCH_ID | Source_system_id | AdmUserGroupLink_id |  AdmUser_id  | LAUNCH_ID | Source_system_id | AdmUserGroupLink_id | IsActual |        Actual_date         |         Load_date          | Source_system_id | LAUNCH_ID |  AdmUser_id  | IsActual | rn | valid_to
    ---------------------+---------------------+-----------+------------------+---------------------+--------------+-----------+------------------+---------------------+----------+----------------------------+----------------------------+------------------+-----------+--------------+----------+----+----------
             24429000001 | 8937546938320497366 |   3880948 |                1 |         24429000001 | 129441500001 |   3880948 |                1 |         24429000001 | t        | 2019-06-06 01:31:06.664558 | 2019-06-06 01:31:06.664558 |                1 |   3880948 | 129441500001 | t        |  1 |
    (1 row)
    
  • Thank you

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file