Redistribute data across nodes
My current data usage looks like:
dev=> select * from disk_storage;How can I rebalance the disk usage?
node_name | storage_path | storage_usage | rank | throughput | latency | storage_status | disk_block_size_bytes | disk_space_used_blocks | disk_space_used_mb | disk_space_free_blocks | disk_space_free_mb | disk_space_free_percent
---------------------+----------------------------------------------------+---------------+------+------------+---------+----------------+-----------------------+------------------------+--------------------+------------------------+--------------------+-------------------------
v_armdevdb_node0001 | /data/ARMDEVDB/v_armdevdb_node0001_catalog/Catalog | CATALOG | 0 | 0 | 0 | Active | 4096 | 29380292 | 114766 | 1104914 | 4316 | 3%
v_armdevdb_node0001 | /data/ARMDEVDB/v_armdevdb_node0001_data | DATA,TEMP | 0 | 0 | 0 | Active | 4096 | 29380292 | 114766 | 1104914 | 4316 | 3%
v_armdevdb_node0002 | /data/ARMDEVDB/v_armdevdb_node0002_catalog/Catalog | CATALOG | 0 | 0 | 0 | Active | 4096 | 28109057 | 109801 | 2376149 | 9281 | 7%
v_armdevdb_node0002 | /data/ARMDEVDB/v_armdevdb_node0002_data | DATA,TEMP | 0 | 0 | 0 | Active | 4096 | 28109057 | 109801 | 2376149 | 9281 | 7%
v_armdevdb_node0004 | /data/ARMDEVDB/v_armdevdb_node0004_catalog/Catalog | CATALOG | 0 | 0 | 0 | Active | 4096 | 2451305 | 9575 | 28033901 | 109507 | 91%
v_armdevdb_node0004 | /data/ARMDEVDB/v_armdevdb_node0004_data | DATA,TEMP | 0 | 0 | 0 | Active | 4096 | 2451305 | 9575 | 28033901 | 109507 | 91%
(6 rows)
0
Comments
/Sajan
The rebalance_cluster() function also failed to complete, probably due to lack of disk space.
Apart from dumping the database to files and restarting from scratch is there any hope for rebalancing in this case?
select node_name,projection_name,sum(used_bytes) from projection_storage group by node_name,projection_name order by projection_name,sum(used_bytes) desc;
If you have a lot of non-superprojections, and you can drop/ recreate them without stopping down user queries significantly (or get a maintenence downtime), get the projection creation schema for these projections using:
select export_objects('','<projection_name>');
Then, recreate the projection with a modified segmentation clause that distributes the data evenly, and dont forget to use the MODULARHASH or the HASH function in the segmentation clause.
You should be able to iterate through the projections, and gradually free up more space in the heavily loaded node.
Hope that helps.
/Sajan
If you see data skew in projections and you want to do rebalance only for few projections. Then this is possible in Vertica 7.1 Dragline.
REBALANCE_DATA_IN_VERTICA 7.1
Hope this helps.
NC
I am in a bit of a pickle here I think, I am on Vertica 6.1.2-0 with no real chance of upgrade at the moment.
The two largest projections in the entire database are super projections, they are an order of magnitude larger than others. If I create another super projection then It will probably fail to be created on the two overloaded nodes because they have no disk space.
Not sure where to go now, any ideas?
as you see it is not refreshed.
Follow Sajan's notes. Create segmented Super projection and drop the unsegmented. You should be good.
Sashi
~ Vertica Support