Backup and restore a single scheme with data to different single node
I have 4 node prod vertica cluster with couple of schemes. Now I created a single node vertica for staging. Now I only want to take backup of one scheme with data from production and restore it to one node staging
0
Comments
You can take object level backup in Vertica, but I don't think they are compatible or they restore to a new database .
Instead, you can use the EXPORT to Vertica option to export data from Production cluster to Staging cluster.
Hope this helps.
I tried this after login to production:
vabs=> CONNECT TO VERTICA vabs USER dbadmin PASSWORD '*******' ON '*.*.*.*',5433;
CONNECT
vabs=> EXPORT TO VERTICA vabs.BigData.Country AS select * from vabs.BigData.Country;
ERROR 4040: Network address [] could no be resolved for node [v_vabs_node0001]
vabs=>
Did you follow these steps before exporting:
1. Create network interface for each node
2. Alter each node to have the export network interface created above
3. Have same structure of table on other cluster.
4. Connect database on different cluster with the export address set for that cluster ( for export address check the nodes table on that cluster)
5. Export using Export to Vertica options.
Let us know, if you have performed these steps.
node_name | node_id | node_state | node_address | export_address | catalog_path | is_ephemeral
---------------------+-------------------+------------+--------------+----------------+------------------------------------------------------------+--------------
v_vabs_node0001 | 45035996273704980 | UP | 10.0.1.127 | 10.0.1.127 | /data/vertica/vabs/v_vabs_node0001_catalog/Catalog | f
Prod
vabs=> select * from nodes;
node_name | node_id | node_state | node_address | export_address | catalog_path | is_ephemeral
---------------------+-------------------+------------+--------------+----------------+------------------------------------------------------------+--------------
v_vabs_node0001 | 45035996273704970 | UP | 10.0.1.121 | 10.0.1.121 | /data/vertica/vabs/v_vabs_node0001_catalog/Catalog | f
v_vabs_node0002 | 45035996415221126 | UP | 10.0.1.122 | 10.0.1.122 | /data/vertica/vabs/v_vabs_node0002_catalog/Catalog | f
v_vabs_node0003 | 45035996415221130 | UP | 10.0.1.123 | 10.0.1.123 | /data/vertica/vabs/v_vabs_node0003_catalog/Catalog | f
v_vabs_node0005 | 45035998153802294 | UP | 10.0.1.124 | 10.0.1.124 | /data/vertica/vabs/v_vabs_node0005_catalog/Catalog | f
I want to restore a schema from prod to staging
1. First I login to prod database(10.0.1.124)
/opt/vertica/bin/vsql -h 10.0.1.124 -U bigdatareadonly -w ******** -d vabs
2. I tried to connect to staging database node(10.0.1.124) by below command
vabs=> CONNECT TO VERTICA vabs USER dbadmin PASSWORD '*****' ON '10.0.1.127',5433;
CONNECT
vabs=> EXPORT TO VERTICA vabs.BigData.Country AS select * from BigData.Country;
ERROR 4126: No valid address found for subnet [10.1.1.0]
Prod:
node_id | node_name | interface | ip_address | subnet | mask | broadcast_address -------------------+---------------------+-----------+------------+------------+-----------------+-------------------
45035996273704970 | v_vabs_node0001 | lo | 127.0.0.1 | 127.0.0.0 | 255.0.0.0 | 127.0.0.1
45035996273704970 | v_vabs_node0001 | lo:0 | 10.0.1.126 | 10.0.1.126 | 255.255.255.255 | 127.255.255.255
45035996273704970 | v_vabs_node0001 | eth0 | 10.0.1.121 | 10.0.1.0 | 255.255.255.0 | 10.0.1.255
45035996273704970 | v_vabs_node0001 | eth0 | 10.0.1.126 | 10.0.1.126 | 255.255.255.255 | 10.0.1.126
45035996415221130 | v_vabs_node0003 | lo | 127.0.0.1 | 127.0.0.0 | 255.0.0.0 | 127.0.0.1
45035996415221130 | v_vabs_node0003 | lo:0 | 10.0.1.126 | 10.0.1.126 | 255.255.255.255 | 127.255.255.255
45035996415221130 | v_vabs_node0003 | eth0 | 10.0.1.123 | 10.0.1.0 | 255.255.255.0 | 10.0.1.255
45035998153802294 | v_vabs_node0005 | lo | 127.0.0.1 | 127.0.0.0 | 255.0.0.0 | 127.0.0.1
45035998153802294 | v_vabs_node0005 | lo:0 | 10.0.1.126 | 10.0.1.126 | 255.255.255.255 | 127.255.255.255
45035998153802294 | v_vabs_node0005 | eth0 | 10.0.1.124 | 10.0.1.0 | 255.255.255.0 | 10.0.1.255
45035996415221126 | v_vabs_node0002 | lo | 127.0.0.1 | 127.0.0.0 | 255.0.0.0 | 127.0.0.1
45035996415221126 | v_vabs_node0002 | lo:0 | 10.0.1.126 | 10.0.1.126 | 255.255.255.255 | 127.255.255.255
45035996415221126 | v_vabs_node0002 | eth0 | 10.0.1.122 | 10.0.1.0 | 255.255.255.0 | 10.0.1.255
Stage:
vabs-> ; node_id | node_name | interface | ip_address | subnet | mask | broadcast_address
-------------------+---------------------+-----------+------------+-----------+---------------+-------------------
45035996273704980 | v_vabs_node0001 | lo | 127.0.0.1 | 127.0.0.0 | 255.0.0.0 | 127.0.0.1
45035996273704980 | v_vabs_node0001 | eth0 | 10.0.1.127 | 10.0.1.0 | 255.255.255.0 | 10.0.1.255
we don't have public ip in our vertica cluster.
For large data sets EXPORT TO VERTICA works bad.
IMO, will be a better solution: You can apply compression on the fly: PS
If you will use in method that Navin suggested, so don't do COPY in one stream, split it with filtering, for example (EXPORT TO VERTICA works bad with large data set):
- WHERE id >= 1 and id <= 100000
- WHERE id >= 10000 and id <= 100000000
...
Also for better network throughput define CompressNetworkData, it will improve Network throughput
General Parameters
Try creating a subnet with loadbalancer IP.
If this works, you can use public network for exporting data to your stage cluster.
select * from vs_network_interfaces;
select * from nodes;
Is this specifically with large data sets and limited bandwidth between the two clusters? EXPORT TO VERTICA is optimized for high-bandwidth setups, so vsql + gzip (or bzip2, etc) can have a significant advantage over slow networks.
@adam
>> Where have you had trouble with it?
Performance degrades, that's why Im recommending to bound it with filters. With small data set it works good.
>>EXPORT TO VERTICA is optimized for high-bandwidth setups
Hmm... I don't remember and unfortunately have no network configuration, but I think it was on Amazon. I will try to test on local network.
@all
Do not listen to me about EXPORT TO VERTICA, test it.
Actually its will be interesting to build a graph: NUM ROWS VS TIME