refresh vs rebalance
Navin_C
Vertica Customer ✭
Hello All,
While doing re-segmentation of projections on nodes, I stumbled upon this question, as I was trying to check the status of re-segmentation
what is the basic difference between refresh and rebalance
From my point of view it is :
REFRESH - Populating data into empty projections and making them ready to answer queries
REBALANCE - redistribute data across the cluster again on all nodes or according to projection definition(Segmented by/unsegmented)
If the above statements are right
When do we use projection_refreshes table and when do we use REBALANCE_PROJECTION_STATUS table, as the later table does not show records as expected(transfered_bytes/to_Transfered_bytes)
Thanks
While doing re-segmentation of projections on nodes, I stumbled upon this question, as I was trying to check the status of re-segmentation
what is the basic difference between refresh and rebalance
From my point of view it is :
REFRESH - Populating data into empty projections and making them ready to answer queries
REBALANCE - redistribute data across the cluster again on all nodes or according to projection definition(Segmented by/unsegmented)
If the above statements are right
When do we use projection_refreshes table and when do we use REBALANCE_PROJECTION_STATUS table, as the later table does not show records as expected(transfered_bytes/to_Transfered_bytes)
Thanks
0
Comments
Regarding REFRESH your understanding is right, it is populating the projections with data.
On other hand,HP Vertica automatically rebalances your database when adding or removing nodes. You can also manually trigger a rebalance
using the Administration Tools or using SQL functions.
In case of NODE ADDITION,REBALANCE is populating new nodes with segmented projections & duplicating non-segmented projections on them.
Whether the rebalance process is started manually or automatically, the process takes the following steps:
==>For segmented projections, HP Vertica creates new (renamed), segmented projections that are identical in structure to the existing projections,
but which have their data distributed across all nodes. The rebalance process then refreshes all new projections,
sets the Ancient History Mark (AHM) to the greatest allowable epoch (now), and drops all of the old segmented projections.
All new buddy projections have the same base name so they can be identified as a group.
==>For unsegmented projections, leaves existing projections unmodified, creates new projections on the new nodes, and refreshes them.
After the data has been rebalanced, HP Vertica drops:
Duplicate buddy projections with the same offset
Duplicate replicated projections on the same node
To check Rebalance Progress , once initiated - we use REBALANCE_TABLE_STATUS. It will have data if you have initiated REBALANCE usind admintools or manually.
Eg:
select table_name, rebalance_method, duration_sec, transferred_percent, transferred_bytes, to_transfer_bytes
from REBALANCE_TABLE_STATUS where transferred_percent < 100
order by transferred_percent desc;
Above query will show where re balance is still pending.
Regards'
Abhishek
You are right about your statements.
REFRESH - Populating data into empty projections and making them ready to answer queries
REBALANCE - redistribute data across the cluster again on all nodes or according to projection definition(Segmented by/unsegmented)
You don't check the refresh in the rebalance table. Rebalance is done when you add or remove nodes in the cluster.
Refresh is when you want to put data into empty projections.
Rebalance in normally done when your cluster size changed and you need to distribute the data. When you load data in the cluster you may have the data already split and when you add nodes in the cluster vertica just move the files to new nodes and 'rebalance' the data between nodes. You can read more about it in the documentation under Elastic Cluster.
You could also rebalance by refresh, that is when you add node to the cluster you create a new projection and 'refresh it'. This method involves to copy the data so you will need to have space to be able to hold an extra projection and when the new projection is refreshed you should drop the old one.
Does make sense? There is so much to say about this both terms that I don't think that we can fit it in the community post. If you have any specific question please let us know.
Eugenia
Thanks Eugenia