Vertica 7.2.2 - Copycluster got empty tables on destination
stestaz
Community Edition User ✭
Hi All,
I'm running vbr with copycluster option in order to clone my production cluter to staging cluster, each cluster has got 1 node and the vertica version is the same.
The vbr process runs correctly and no error are promped but on destination some tables are empty, I've tried to drop the empy tables on destination but the result is the same, I've tried to delete the destination db too but with no luck, this is my ini file
[Misc]
snapshotName = dwh_snapshot
dest_verticaBinDir = /opt/vertica/bin
restorePointLimit = 3
objectRestoreMode = createOrReplace
[Database]
dbName = dwh
dbUser = dbadmin
dbPromptForPassword = True
[Transmission]
port_rsync = 50000
[Mapping]
v_dwh_node0001 = xxx.xxx.xxx.245:
Any suggestions?
0
Answers
Trying running with option "--debug 3" and check logs for any additional messages that might show why some tables are not copied. Also check the temp dir (default /tmp/vbr) to see if there are additional log files there. You may need to set temp dir explicitly in the [Misc] section to ensure proper permissions and enough free space to copy objects:
[Misc]
tempDir=/path/to/vbrTemp
I've tried as you suggested but with no luck, there are no errors in the log file and the table is still empty
@stestaz - Any info in the Vertica log on the desitnation node?
This is what I found on the vertica.log
2020-01-03 11:21:18.678 Init Session:0x7f1cac011dd0-a00000000120a5 [Txn] Begin Txn: a00000000120a5 'Moveout: (Table: dmtrt.org_d_uo) (Projection: dmtrt.org_d_uo_v1_super)'
2020-01-03 11:21:18.679 Init Session:0x7f1cac011dd0-a00000000120a5 [EE] (a00000000120a5) TM Moveout: WOS is empty at epoch 16007933; nothing to do
2020-01-03 11:21:18.679 Init Session:0x7f1cac011dd0-a00000000120a5 [Txn] Starting Commit: Txn: a00000000120a5 'Moveout: (Table: dmtrt.org_d_uo) (Projection: dmtrt.org_d_uo_v1_super)'
This is what I found on the vertica.log
2020-01-03 11:21:18.678 Init Session:0x7f1cac011dd0-a00000000120a5 [Txn] Begin Txn: a00000000120a5 'Moveout: (Table: dmtrt.org_d_uo) (Projection: dmtrt.org_d_uo_v1_super)'
2020-01-03 11:21:18.679 Init Session:0x7f1cac011dd0-a00000000120a5 [EE] (a00000000120a5) TM Moveout: WOS is empty at epoch 16007933; nothing to do
2020-01-03 11:21:18.679 Init Session:0x7f1cac011dd0-a00000000120a5 [Txn] Starting Commit: Txn: a00000000120a5 'Moveout: (Table: dmtrt.org_d_uo) (Projection: dmtrt.org_d_uo_v1_super)'
Is "Table: dmtrt.org_d_uo" one of the tables that was not copied? If so, were there any other messages related? It might help to grep the name of a table that is not copied to determine whether there were any messages at all related to the table or its projections.
Also, it would help to see the vbr logs for the copy at debug level 3, even if they appeared successful, in case there were messages there that might help us understand what is happening.
Hi Bryan, yes dmtrt.org_d_uo is one of the tables that was not copied.
In the Source machine I've a log file from vbr that I attached but I cannot find nothing interesting but I'm not so experienced.
there are some logs in folder /tmp/vbr_rsyncd can tehy be usefull?
I see nothing unexpected in the vbr log. The log files in /tmp/vbr_rsyncd may be useful since they may show an issue copying specific files.
How many tables are missing data? I wonder if there is some other issue I'm not thinking of.
I've tried to compare tables by dimension whit this query
select schema_name as table_schema,
anchor_table_name as table_name,
round(sum(used_bytes)/(1024^2), 2) as used_mb
from v_monitor.storage_containers sc
join v_catalog.projections p
on sc.projection_id = p.projection_id
group by table_schema,
table_name
order by used_mb desc;
and there is a difference in nearly all tables, it look likes if the restore is a partial restore or if it is not a full restore, i'm going crazy...
I've found another strange thing in the source host, the epoch if I use vbr --task listbackup is stuck at 15626842 can the 2 things be connected?
That's a bit of a concern, and would explain why there is missing data: only data up to that epoch is backed up.
Let's ccheck whether epoch is actually advancing in Vertica. The following query will show Last Good Epoch and Ancient History Mark:
SELECT /+label(LGEAHMlagging)/ current_epoch,Case WHEN (last_good_epoch = -1 ) THEN current_epoch -1 ELSE last_good_epoch END lge_epoch,ahm_epoch,
Case WHEN (last_good_epoch = -1 ) THEN 1 ELSE (current_epoch - last_good_epoch) END lge_lag, current_epoch - ahm_epoch as ahm_lag
FROM system;
Hi Bryan, unlikely my cluster had got a problem and I had to restore from the old backup (loosing all data after the backup epoch), now I think the epoch is not stuck at all and this is the result of the query
I think now the problem with sincronization and backup is solved but many data has been lost