Received fatal signal SIGSEGV with CREATE PROJECTION.. SEGMENTED BY HASH(...)

Hi!
I am creating a projection manually on a 3-cluster-node Vertica.
When I create the projection with the statement 
 UNSEGMENTED ALL NODES KSAFE 1
everything goes well.

BUT, If I create the projection with the statement
SEGMENTED BY HASH(A,B,C,D) ALL NODES KSAFE 1
the node from where I launched the query goes down and in vertica.log I find this error:
[Main] <INFO> Handling signal: 11
[Main] <PANIC> Received fatal signal SIGSEGV.
[Main] <PANIC> Info: si_code: 1, si_pid: 277, si_uid: 0, si_addr: 0x115
<WARNING> @v_vertica_node0001: 01000/5439: Vertica suggests allowing 1 open file per MB of memory, minimum of 65536; see 'ulimit -n'
It also happened that in a segmentation statement I got the same error but without some of the columns as arguments in the HASH It worked.

Can you help me?

thanks,
Pietro.

Comments

  • Hi Pietro,
    Looks like you have exceeded the OS open file descriptor limit when trying to open column files during segmentation. Running a purge() and a mergeout should help by consolidating ROS containers. You can also increase the OS limit by changing the fs.max parameter in /etc/sysctl.conf file. 

    /Sajan
  • Hi,

    We may be able to help you isolate this. Please provide the following and we'll review to see if it's a known issue or if it's reproducible such that we can log one and try to identify a workaround.

    vertica version you are on, in vsql "select version();", and post that in your reply

    in vsql do the following to export you current design that includes the definitions for the table(s) in question, attach that file to your reply
    select export_catalog('/tmp/t1474_design.sql','DESIGN');

    at the OS look in the catalog dir for a file named ErrorReport.txt. If you view it there should be a backtrace with a timestamp matching the last time you tried the create projection and got the PANIC. attach that to your reply

    and finally provide the sql you execute that causes the panic.

    The errorreport.txt file will help us look for known issues matching the signatures in the backtrace. The design and sql that causes the panic will help us further match to a known issue or try to reproduce. And the version will help in the case where it is a known issue, as we may be able to find a version it's fixed in. It will also help during attempts to reproduce if we don't find a known issue.

  • Thank you fro your reply.
    I've tried your solution but it doesn't fix this problem. The node continues to go down when I launch the statement. If I don't use a specific column it doesn't.
    Is there a limitation on data type, lenght or cardinality ?

    the affected column is of type NUMBER and contains both negative and positive values.
    I've already tried to segment by ABS(column) but I still have the error.

    Pietro.

  • Hi,

    We may be able to help you isolate this. Please provide the following and we'll review to see if it's a known issue or if it's reproducible such that we can log one and try to identify a workaround.

    vertica version you are on, in vsql "select version();", and post that in your reply

    in vsql do the following to export you current design that includes the definitions for the table(s) in question, attach that file to your reply
    select export_catalog('/tmp/t1474_design.sql','DESIGN');

    at the OS look in the catalog dir for a file named ErrorReport.txt. If you view it there should be a backtrace with a timestamp matching the last time you tried the create projection and got the PANIC. attach that to your reply

    and finally provide the sql you execute that causes the panic.

    The errorreport.txt file will help us look for known issues matching the signatures in the backtrace. The design and sql that causes the panic will help us further match to a known issue or try to reproduce. And the version will help in the case where it is a known issue, as we may be able to find a version it's fixed in. It will also help during attempts to reproduce if we don't find a known issue.

  • version : Vertica Analytic Database v6.1.3-0
    (community edition)

    query :

    CREATE PROJECTION hr_sp.prejoin_maloprodaja_artikl_prodavaonica_datum_q5_2
    (
     godina_opis,
     mjesec_redni_broj_godina,
     artikl_potkategorija_sifra,
     artikl_sifra,
     artikl_opis,
     prod_skl_sifra,
     kolicina_osnovna_jm,
     mpv_s_pdv,
     nabavna_vrijednost,
     bruto_marza,
     partition_name
    )
    AS
     SELECT d_datum.godina_opis,
            d_datum.mjesec_redni_broj_godina,
            d_artikl.artikl_potkategorija_sifra,
            d_artikl.artikl_sifra,
            d_artikl.artikl_opis,
            d_prodavaonica_skladiste.prod_skl_sifra,
            f_maloprodaja.kolicina_osnovna_jm,
            f_maloprodaja.mpv_s_pdv,
            f_maloprodaja.nabavna_vrijednost,
            f_maloprodaja.bruto_marza,
            f_maloprodaja.partition_name
     FROM hr_sp.f_maloprodaja
    JOIN hr_sp.d_datum ON f_maloprodaja.datum_racuna_kljuc = d_datum.datum_kljuc
    JOIN hr_sp.d_artikl ON f_maloprodaja.artikl_kljuc = d_artikl.artikl_kljuc
    JOIN hr_sp.d_prodavaonica_skladiste ON f_maloprodaja.prodavaonica_kljuc = d_prodavaonica_skladiste.prod_skl_kljuc
     ORDER BY d_datum.godina_opis,
              d_datum.mjesec_redni_broj_godina,
              d_artikl.artikl_potkategorija_sifra,
              d_artikl.artikl_sifra,
              d_artikl.artikl_opis,
              d_prodavaonica_skladiste.prod_skl_sifra
    SEGMENTED BY HASH(D_ARTIKL.ARTIKL_SIFRA , D_ARTIKL.ARTIKL_OPIS, 
            d_prodavaonica_skladiste.prod_skl_sifra) ALL NODES ksafe 1

    ErrorReport https://dl.dropboxusercontent.com/u/75785015/ErrorReport.txt

    designhttps://dl.dropboxusercontent.com/u/75785015/t1474_design.sql
  • Hi,

    Thanks. I'll see what I can find. One additional question. Which hash expression item is it that if you leave it out it doesn't crash?
  • Hi all,

    If I may just jump in briefly:  Pietro, what exactly did you change, with regard to Steve's suggestion?  (Changes to sysctl.conf don't take effect immediately/automatically...)

    Thanks,
    Adam
  • Hi,

    I was able to reproduce with the design and query you supplied. The d_prodavaonica_skladiste.prod_skl_sifra hash expression item seems to be the key one that if removed eliminates the panic. The backtrace is a match for a known issue which is fixed in version 7.0 and targeted for the next 6.1 service pack but that is not yet scheduled. It's specific to prejoins and a special check done to see if the are any columns in any involved projections that have long strings.

    Unfortunately there aren't any ideal workarounds in V6.1 as the error is random in nature, Ie no rule you can follow to steer clear of it. Ultimately you'd be forced to redefine the has expression and the prejoin projection columns and select as columns to something that gives the segmentation you want but doesn't panic the server. Upgrading to V7 is the only immeidate permanent solution that would allow you to retain the desired segmentation.
  • Hi Steve,
    thank you for your help. I think I am going to upgrade Vertica then.
    So, just to make it clearer, which is the error in this case? You said that the error involves a specific check for long strings but that field is of type numeric actually.


    thanks again,
    Pietro

  • The upgrade to Vertica 7 fixed the problem! thank you
  • Hi,

    Regarding your prior question on the long string check and the column in question being numeric, the long string check occurs any time there's a prejoin. I couldn't find a solid reference but I believe long strings are not allowed for prejoins' hash expressions, so the logic checks on create of a prejoin to make sure this rule isn't violated. The failure wasn't specifically due to the data type of the column in that table. Apparently the failure could happen due to various combinations of things. That table.column just happened to align things such the the long string check broke. Other table.column combos could have broken it as well.

    Glad we got to root cause and you were able to upgrade to V7 and get past it.
  • thanks again! :)

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file