crash in ResegmentMerge after upgrade to 9.2
phil2
✭
Hello!
I'm facing SIGSEGV on some queries after an upgrade from 8.1.1-4 to 9.2.0-7
Queries:
select * from( select lu.AdmUser_id, ia.IsActual, l.AdmUserGroupLink_id, min(ia.Actual_date) over(partition by ia.AdmUserGroupLink_id) as valid_from , lead(ia.Actual_date, 1, null) over(partition by ia.AdmUserGroupLink_id order by ia.Actual_date) as valid_to from DDS.H_AdmUserGroupLink l join DDS.L_AdmUserGroupLink_AdmUser lu on lu.AdmUserGroupLink_id = l.AdmUserGroupLink_id join DDS.S_AdmUserGroupLink_IsActual ia on ia.AdmUserGroupLink_id = l.AdmUserGroupLink_id )t where admuser_Id = 124954500001 limit 50; select *, lu.AdmUser_id, ia.IsActual, row_number() over(partition by l.AdmUserGroupLink_id) rn, lead(lu.AdmUser_id) over(partition by l.AdmUserGroupLink_id order by true) valid_to from DDS.H_AdmUserGroupLink l join DDS.L_AdmUserGroupLink_AdmUser lu on lu.AdmUserGroupLink_id = l.AdmUserGroupLink_id join DDS.S_AdmUserGroupLink_IsActual ia on ia.AdmUserGroupLink_id = l.AdmUserGroupLink_id where ia.IsActual is not null order by lu.AdmUser_id desc limit 100;
DDL is in attachment
I have already recreated and refilled these tables and it did not help.
Each time I run it I get SIGSEGV on every vertica node.
Backtrace is in attachments.
Looks like the problem within StorageMerge in a ResegmentMerge::performResegmen call:
(_ZN2EE14ResegmentMerge16performResegmentEPNS_11MergeBufferEiiRNS_14TupleStructureERSt6vectorIPNS_13SegTempBufferESaIS7_EEiiPNS_5VEvalERNS_11VEvalParamsE+0x551) [0x130fa81]
Can you help me with fixing this?
0
Comments
What was your upgrade path?
vertica_8.1.1-4_amd64.deb (initial)
vertica_9.0.1-20_amd64.deb
vertica_9.1.1-8_amd64.deb
vertica_9.2.0-7_amd64.deb
@phil2 - Are you running on Debian Linux Versions 8.5 or 8.9?
I tried your examples on Vertica 9.2.1-1 on RHEL and did not have an issue. Althouogh, I did not have any data.
@phil2 - You are running Vertica 9.2.0-7 on an unsupported version of Debian Linux.
See:
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SupportedPlatforms/MCandServer.htm
@LenoyJ Yes, I did start the DB after each upgrade step
@Jim_Knicely So you won't help us?
here is the data to reproduce crash
I tried without loading data on 9.2.0-7. it did not crash. I will try by using the data you provided.
dbadmin=> select version();
version
Vertica Analytic Database v9.2.0-7
(1 row)
dbadmin=>
dbadmin=> select *
dbadmin-> from(
dbadmin(> select lu.AdmUser_id, ia.IsActual, l.AdmUserGroupLink_id,
dbadmin(> min(ia.Actual_date) over(partition by ia.AdmUserGroupLink_id) as valid_from ,
dbadmin(> lead(ia.Actual_date, 1, null) over(partition by ia.AdmUserGroupLink_id order by ia.Actual_date) as valid_to
dbadmin(> from DDS.H_AdmUserGroupLink l
dbadmin(> join DDS.L_AdmUserGroupLink_AdmUser lu on lu.AdmUserGroupLink_id = l.AdmUserGroupLink_id
dbadmin(> join DDS.S_AdmUserGroupLink_IsActual ia on ia.AdmUserGroupLink_id = l.AdmUserGroupLink_id
dbadmin(> )t
dbadmin-> where admuser_Id = 124954500001
dbadmin-> limit 50;
AdmUser_id | IsActual | AdmUserGroupLink_id | valid_from | valid_to
------------+----------+---------------------+------------+----------
(0 rows)
dbadmin=>
dbadmin=> select *, lu.AdmUser_id, ia.IsActual,
dbadmin-> row_number() over(partition by l.AdmUserGroupLink_id) rn,
dbadmin-> lead(lu.AdmUser_id) over(partition by l.AdmUserGroupLink_id order by true) valid_to
dbadmin-> from DDS.H_AdmUserGroupLink l join DDS.L_AdmUserGroupLink_AdmUser lu on lu.AdmUserGroupLink_id = l.AdmUserGroupLink_id
dbadmin-> join DDS.S_AdmUserGroupLink_IsActual ia on ia.AdmUserGroupLink_id = l.AdmUserGroupLink_id
dbadmin-> where ia.IsActual is not null
dbadmin-> order by lu.AdmUser_id desc
dbadmin-> limit 100;
AdmUserGroupLink_id | External_ID | LAUNCH_ID | Source_system_id | AdmUserGroupLink_id | AdmUser_id | LAUNCH_ID | Source_system_id | AdmUserGroupLink_id | IsActual | Actual_date | Load_date | Source_system_id | LAUNCH_ID | AdmUser_id | IsActual | rn | valid_to
---------------------+-------------+-----------+------------------+---------------------+------------+-----------+------------------+---------------------+----------+-------------+-----------+------------------+-----------+------------+----------+----+----------
(0 rows)
Hi Phil,
Yes, I removed it. So far loaded data into H_AdmUserGroupLink. Can you share me the count of rows for each table. Does count of rows matter?
Sruthi
Just tested in Vertica 9.2.1-1 and the second query causes a DB crash!
@phil2 - Are you a client? Can you open a case? I will open a JIRA.
Hi Jim,
There is a support case for this issue logged by client today SD02484487
Sruthi
Yeap, It was me
@phil2 - I reproduced your issue by simply using your DDL and data. This issue has been identified and engineering is working on a fix. I will keep this thread open with updates!
If you email your customer SAID I can view the case and update support with the JIRA info.
My email address is james.knicely@microfocus.com
Thanks!
Hi Jim, we are also experiencing this issue. We have a ticket opened with Vertica support. Is there any ETA on a fix?
Sorry, I should have updated this thread earlier! The issue was resolved in 9.2.1-4.
See:
https://www.vertica.com/docs/ReleaseNotes/9.2.x/Vertica_9.2.x_Release_Notes.htm
In particular:
Issue VER-68156 - Certain combinations of analytic functions applied to the output of a merge join would occasionally cause Vertica to crash. This issue has been fixed.
I retried the original query in this thread that caused the DB to crash, and it works fine now!
Thank you