ANALYZE_STATICS stuck

Hello,

I run a 3 node cluster on AWS with Vertica 7.0.1. The DB itself is very small, about 20G compressed on disk.

When running ANALYZE_STATISTICS, it seems to stuck very frequently. Sometimes it runs in a few seconds and sometimes it doesn't finish even in 30 minutes.
And when this happens, if I try to close the session , that doesn't help either, even after issuing close_session, ANALYZE_STATISTICS will just stuck in there.

Can you please advise what is the work around on this?

Thanks,
Tibor




Comments

  • Navin_CNavin_C Vertica Customer
    Hi Tibor,

    If you want to check if ANALYSE_STATISTICS is still running ...open a duplicate session and checking query_profiles table if the statement is still running. The query_type column will be 'UTILITY', check with the is_executing column.

    Secondly if you wish to stop ANALYZE_STATISTICS after a certain period sue INTERUPT_STATEMENT form a duplicate statement

    https://my.vertica.com/docs/7.0.x/HTML/index.htm#Authoring/SQLReferenceManual/Functions/VerticaFunctions/INTERRUPT_STATEMENT.htm  

    Hope this helps
  • Thanks Navine.

    But the real issue is, why this is happening? Here is the output from the sessions table for the running queries. Again, the db is on 3 node cluster, with very little data (20Gb).

    And as you can see, SELECT ANALYZE_STATISTICS('') is running for over 8 minutes now.
    And why this is happening, other queries get hold up. See  SELECT * FROM query_profiles LIMIT 100;

    Is this a 7.0.1 specific issue?  This is a pretty serious issue, we cannot safely updating statistics.

     datediff |    session_id     | statement_id |                                                              current_statement

    ----------+-------------------+--------------+---------------------------------------------------------------------------------------------------------------------------------------------

          460 | vdb1-1934:0x13422 |            1 | SELECT ANALYZE_STATISTICS('');

          249 | vdb1-1934:0x13a78 |            1 | SELECT * FROM query_profiles LIMIT 100;

            0 | vdb1-1934:0x13cac |            1 | SELECT datediff(second, statement_start, GETDATE()), session_id, statement_id, current_statement FROM SESSIONS WHERE current_statement <>''



    At the same time, if I am trying to interrupt the held up query, my request in VSQL seem to hang up too!

    dbadmin=> SELECT * FROM query_profiles LIMIT 100;

    ^CCancel request sent

    ^CCancel request sent

    ^CCancel request sent

    ^CCancel request sent

    ^CCancel request sent

    ^CCancel request sent

    ^CCancel request sent

    ^CCancel request sent

    Please advise.,
    Thank you,
    Tibor

  • I see your on AWS. That could be a clue to the problem. We'd need a lot more information to ascertain why this is occurring. If I had to wager a guess, it would probably be somewhere in the area of disk performance, the location of the catalog mount points, and maybe how AWS is multi-threading some of its operations around that location specifically. It would be incredibly hard to debug via a community forum.
    Unless someone from support or engineering has seen the issue before and knows a solution.

    So, some generic questions:
    What kind of disk?
    Where is the disk (co-located with the CPU, or is it some sort of SAN implementation in the cloud?)
    What is the disk performance?
    Are the nodes in the same rack?
    What kind of hardware?

  • Hi Curtis,

    The specs:
    1) 3x cc3.2xl nodex (15G RAM, 8 core)
    2) Using local storage with the 2x SSDs in RAID 0  (to avoid EBS perf issues)

    The local SSD are decent, Write: 150 MB/s, Read: 200 MB/s, SkipReads: 6400 seeks/s based on vioperf.

    I seriously doubt that this is a storage/instance performance issue, as don't see that we with anything else. Everything runs fast consistently, except ANALYZE_STATISTICS(''), which runs sometimes 30 seconds and sometimes just stucks until I manage to close the session that it was started in.

    I did at least 30 Vertica installations on various systems, but this is the first time I saw this issue, so I wonder if this is a 7.0.1 issue.


    Thanks for any pointers.
    Tibor




  • Further analyzed this issue. I cannot even close the session or run INTERUPT_STATEMENT when this happens. And the more queries I issue when ANALYZE_STATISTICS is running for a long time, the more of them get stuck waiting.

    So the only way I can interrupt the statement if either  shut down the connection physically between the client and the server. 

    Tried the same with vsql running on one of the server nodes itself and saw exact same issue, ie. I couldn't even close the session that was opened by vsql when ANALYZE_STATISTICS was running. So I had kill the vsql process, which in tern ended the session.

    This is such a huge issue, can you please provide some assistance?
    Thanks,
    Tibor

  • Tibor, I'm not sure I can debug this without really debugging your system more thoroughly. Frankly, I'm not even sure where to start. I've never encountered this problem before, so don't really know what could be causing. I am curious, though. how often are you running analyze_statistics?  It sounds like you're running it way more frequently than might be necessary.  You could also try running analyze_histogram. It does the same thing, but you can lower the sample percentage. So, a smaller sample size with analyze_histogram might succeed where analyze_statistics is failing.


Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file