Vertica takes a long time to start
Hi All,
I was worried about performing a snapshot while vertica running, then I decided stopping it.
Should I have to stop vertica to performe VM snapshots?
Vertica is taking a long time to become available, last time I restarted it took for about 45min to became available.
Follows below vertica.log content:
2017-02-23 12:08:57.021 unknown:0x7f0320a97700 [Txn] Catalog sequence set, seq num=a6de39d1, nodeID=a
2017-02-23 12:08:57.021 unknown:0x7f0320a97700 [Txn] Found my node (v_drdata_node0001) in the catalog
2017-02-23 12:08:57.021 unknown:0x7f0320a97700 [Txn] Catalog info: version=0x2bea3ec, number of nodes=1, permanent #=1, K=0
2017-02-23 12:08:57.022 unknown:0x7f0320a97700 [Txn] Catalog info: current epoch=0x19eaf38
2017-02-23 12:08:57.207 unknown:0x7f0320a97700 [Catalog] Catalog OID generator updated based on LOCAL tier catalog
2017-02-23 12:08:57.341 unknown:0x7f0320a97700 [Init] Catalog loaded
2017-02-23 12:08:57.568 unknown:0x7f0320a97700 [Comms] About to launch spread with '/opt/vertica/spread/sbin/spread -c /app/catalog/drdata/v_drdata_node0001_catalog/spread.conf'
2017-02-23 12:08:57.576 unknown:0x7f0320a97700 [Init] Listening on port: 5433
2017-02-23 12:08:57.577 unknown:0x7f0320a97700 [Init] About to fork
2017-02-23 12:08:57.584 unknown:0x7f0320a97700 [Init] About to fork again
2017-02-23 12:08:57.591 unknown:0x7f0320a97700 [Init] Completed forking
2017-02-23 12:08:57.592 unknown:0x7f0320a97700 [Init] Startup [Connecting to Spread] Connecting to spread 4803
2017-02-23 12:08:58.726 unknown:0x7f0320a97700 [Comms] connected to spread as #node_a#N172023201160
2017-02-23 12:08:58.726 unknown:0x7f0320a97700 [Init] PID=1722
2017-02-23 12:08:58.726 unknown:0x7f0320a97700 [Init] Start reading DataCollector information
2017-02-23 12:08:58.730 unknown:0x7f0320a97700 [Init] Startup [Read DataCollector] Inventory files (bytes) - 0 / 1179128391
2017-02-23 12:12:21.180 unknown:0x7f0320a97700 [Init] Startup [Read DataCollector] Inventory files (bytes) - 124070325 / 1179128391
2017-02-23 12:14:15.501 unknown:0x7f0320a97700 [Init] Startup [Read DataCollector] Inventory files (bytes) - 245356234 / 1179128391
2017-02-23 12:16:16.708 unknown:0x7f0320a97700 [Init] Startup [Read DataCollector] Inventory files (bytes) - 382377173 / 1179128391
2017-02-23 12:18:33.052 unknown:0x7f0320a97700 [Init] Startup [Read DataCollector] Inventory files (bytes) - 502611093 / 1179128391
2017-02-23 12:21:17.771 unknown:0x7f0320a97700 [Init] Startup [Read DataCollector] Inventory files (bytes) - 621498606 / 1179128391
2017-02-23 12:23:14.466 unknown:0x7f0320a97700 [Init] Startup [Read DataCollector] Inventory files (bytes) - 739949464 / 1179128391
2017-02-23 12:25:04.137 unknown:0x7f0320a97700 [Init] Startup [Read DataCollector] Inventory files (bytes) - 857987793 / 1179128391
2017-02-23 12:27:35.929 unknown:0x7f0320a97700 [Init] Startup [Read DataCollector] Inventory files (bytes) - 986765394 / 1179128391
2017-02-23 12:28:38.222 unknown:0x7f0320a97700 [Init] Startup [Read DataCollector] Inventory files (bytes) - 1105188507 / 1179128391
2017-02-23 12:28:46.081 unknown:0x7f0320a97700 [Init] Done reading DataCollector information
2017-02-23 12:28:46.082 Main:0x6506fb0 @v_drdata_node0001: 57V03/4973: The UDx zygote process is down, restarting it...
2017-02-23 12:28:47.019 Main:0x6506fb0 [Init] Setting up Messenger service
2017-02-23 12:28:47.020 Main:0x6506fb0 [Dist] Cluster point-point listener port 5434
2017-02-23 12:28:47.021 Main:0x6506fb0 @v_drdata_node0001: 00000/3885: Locking [/app/catalog/drdata/v_drdata_node0001_catalog/vertica.pid], fdLocked == 13
2017-02-23 12:28:47.055 Main:0x6506fb0 [Init] Using "Brazil/East" as the default timezone
2017-02-23 12:28:47.056 Main:0x6506fb0 [Txn] Begin Txn: fff0000000000cc1 'Initialization'
2017-02-23 12:28:47.057 Main:0x6506fb0 [Catalog] Running SINGLE upgrade tasks v7.0.2-5(149346) -> v7.0.2-5(149346)
2017-02-23 12:28:47.057 Main:0x6506fb0 [Catalog] License Audit Record Hash Check Passed.
2017-02-23 12:28:47.057 Main:0x6506fb0 [Catalog] Full License Hash Check Passed.
2017-02-23 12:28:47.057 Main:0x6506fb0 [ResourceManager] pool general - Queries: 10000 Threads: 8192 File Handles: 39263 Memory(KB): 9961926
2017-02-23 12:28:47.057 Main:0x6506fb0 [ResourceManager] pool sysquery - Queries: 10000 Threads: 8245 File Handles: 39518 Memory(KB): 10027462
2017-02-23 12:28:47.057 Main:0x6506fb0 [ResourceManager] pool sysdata - Memory(KB): 1048576
2017-02-23 12:28:47.057 Main:0x6506fb0 [ResourceManager] pool wosdata - Memory(KB): 4194304
2017-02-23 12:28:47.057 Main:0x6506fb0 [ResourceManager] pool tm - Queries: 3 Threads: 8358 File Handles: 40061 Memory(KB): 10166726
2017-02-23 12:28:47.057 Main:0x6506fb0 [ResourceManager] pool refresh - Queries: 10000 Threads: 8192 File Handles: 39263 Memory(KB): 9961926
2017-02-23 12:28:47.057 Main:0x6506fb0 [ResourceManager] pool recovery - Queries: 3 Threads: 8192 File Handles: 39263 Memory(KB): 9961926
2017-02-23 12:28:47.057 Main:0x6506fb0 [ResourceManager] pool dbd - Queries: 10000 Threads: 8192 File Handles: 39263 Memory(KB): 9961926
2017-02-23 12:28:47.057 Main:0x6506fb0 [ResourceManager] pool jvm - Queries: 10000 T
Comments
From the logs uploaded Vertica spent twenty minutes between 12.08 to 12.28 reading data collector. Have you significantly increased retention policy for data collector ? if yes, that may be something to look into .
I am also facing same issue, did the solution of looking into retention policy for data collector worked?
You want to check startup.log in catalog directory(same place where one can find vertica.log) for more clues on why recovery is taking time . Two known reasons could be large catalog with millions of ROS files or large data collector retention policies. I would start with startup.log