Options

Rebalancing stuck in 0%

Hi all :)

 

I add 10 nodes to my 10 nodes cluster (7.1.1-10)
After 26 hours (50TB) the rebalance have only 1 more table (form 156 tables) of 22TB.

 

select * from REBALANCE_TABLE_STATUS

 

But I see that the rebalance doesn't run, I see that the rebalance separated_percen is 0% and in locks table I'm getting:

 

And in session table I see that the statement_id is null were the current_statement is "select rebalance_cluster();"

 

Does anyone know why the rebalancing doesn't run?

Comments

  • Options

    Hi .

    This SQL will  gave you more deeply understanding about what is goining at your rebalance task:

     

    SELECT node_name, session_id, session_start_timestamp, description
    FROM system_sessions
    WHERE session_type = 'REBALANCE_CLUSTER'

     

    I hope you will find it useful 

     

    Thanks 

  • Options

    Hi,

     

    the only thing that i getting from the query:

    SELECT node_name, session_id, session_start_timestamp, description
    FROM system_sessions
    WHERE session_type = 'REBALANCE_CLUSTER'
    and description is not null

     

    is resoults on only 1 node (node010) with the secription "Txn: 130000012212eb0 'rebalance_cluster(background)'"

     

    And still the "select * from rebalance_projection_status" on 0% ... :(

  • Options

    May be I can stop the rebalancing and rerun it?

    I'm afraid that this will do more damage…

     

    I can run CANCEL_REBALANCE_CLUSTER(), but as Vertica document says:

    A rebalance operation can take some time, depending on the number of projections and the amount of data they contain. HP recommends that you allow the process to complete uninterrupted. If you must cancel the operation, call the CANCEL_REBALANCE_CLUSTER function.

     

     

  • Options

    Hi
    Check your disk space availability , rebalance need extra disk space

  • Options

    I think I have a problem here... If i select table REBALANCE_TABLE_STATUS I get the last table who need to be rebalance:

    to_separate_bytes - 666,591,574,912 (620.8GB)

    to_transfer_bytes - 12,187,573,777,220 (11TB)

     

    The free disk space that I have:

    node_name | disk_space_free_gb| disk_space_used_gb| disk_space_total_gb
    ---------------------------------------------------------------------------------------------------
    v_node0010 | 1700.03                    | 3829.36                     | 5529.40
    v_node0009 | 1772.43                    | 3756.96                     | 5529.40
    v_node0008 | 1767.48                    | 3761.91                     | 5529.40
    v_node0007 | 1680.23                    | 3849.17                     | 5529.40
    v_node0006 | 1537.57                    | 3991.82                     | 5529.40
    v_node0005 | 1567.73                    | 3961.66                     | 5529.40
    v_node0004 | 1645.27                    | 3884.12                     | 5529.40
    v_node0003 | 1592.93                    | 3936.46                     | 5529.40
    v_node0002 | 1733.18                    | 3796.21                     | 5529.40
    v_node0001 | 1646.83                    | 3882.56                     | 5529.40

    v_node0011 | 3809.11                    | 1720.29                     | 5529.40
    v_node0012 | 3967.53                    | 1561.86                     | 5529.40
    v_node0013 | 3889.65                    | 1639.74                     | 5529.40
    v_node0014 | 3920.00                    | 1609.39                     | 5529.40
    v_node0015 | 3974.78                    | 1554.61                     | 5529.40
    v_node0016 | 3933.21                    | 1596.18                     | 5529.40
    v_node0017 | 3932.19                    | 1597.20                     | 5529.40
    v_node0018 | 3974.50                    | 1554.89                     | 5529.40
    v_node0019 | 3966.00                    | 1563.39                     | 5529.40
    v_node0020 | 3952.15                    | 1577.25                     | 5529.40

     

    The table spread on nodes 1-10 and I need it to rebalace at the new nodes too (11-20)

     

    Is that a problem?

    If so... what can I do?

     

  • Options

    Hi

    Best practices is  40% available free  disk space , otherwise rebuild should be very slow and process the task in many small phases until completed .

     

    Someing to considure :

     

    rebalance allocated extra  I/O and NET resources ,  you can easily monitor your  rebalance  task by monitoring your Net and I/O subsystems ,  using vioperf and netperf utilities , this will give you indication if the process is hung or executed

     

    I hope you will find it useful

     

    Thanks 

  • Options

    Hi Eli, thank you for your answers.

     

    Couple of things:

    1. 40% free disk space where? In the first 10 nodes (where the table is)? In the 10 new nodes (the ones I added)? Because I have 30% free disk space on each node of the first 10 nodes and 50% free disk space on each node of the 10 new nodes.
    2. By meaning "very slow"… it is passable that after 3 days the rebalance separated_percent still 0%?
    3. In my situation, there is a way that I can do the rebalance? Speed it up? How can I handle it?
    4. How can I monitor my rebalance task using vioperf and netperf utilities? I know those utilities, but I’ll more than happy to hear how I can follow the task using the result of the utilities.

    Thank you very much for help :)

    Chen

  • Options

    Ok... after 3 days (74 hours) all tables are rebalanced... done

  • Options

    Chen ,

     

    Looks like that  in term of Disk space you are Ok .


    Unix utilities will gave you indication if Vertica is running the rebalance or it just hunging  ( assuming no other activities is taking place in your cluster during the rebalance ) , eg: if you see massive I/O activities on the / data FS (df -h is also an option for you )

     

    More options to monitor progress :


    1)rebalance is refreshing projections you can take a look on dc_projection_checkpoint_epochs to see if you have new epochs create for your projections .


    2)Take the dc_rebalanced_projections transaction_id statement_id values which are assigen to your task and query execution_engine_profiles table , this will show real time active stats.

     

    Thanks .

     

     

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file