When we run query to check last accessed on a table how long does it look back in time?
vasanthvk
Vertica Customer
HI,
When we run query to check last accessed on a table how long does it look back in time? Is it for a year, month ? Is there a way to check last accessed on a view also?
SELECT table_name, MAX(time) AS last_access, user_name FROM dc_projections_used WHERE table_schema = '' and table_name = '
0
Answers
When we run query to check last accessed on a table how long does it look back in time?
The amount of time that the query will look back in time will depend on the retention settings for the Data Collector for the ProjectionsUsed component.
SELECT DISTINCT table_name, component, description FROM data_collector where table_name ilike '%projections_used%';
table_name | component | description
---------------------+-----------------+---------------------------------------------
dc_projections_used | ProjectionsUsed | Projections used in each SQL request issued
(1 row)
The get_data_collector_policy function can be used to retrieve information about the retention policy for a specific data collector component like so:
SELECT get_data_collector_policy('ProjectionsUsed');
Is there a way to check last accessed on a view also?
To check when the view was created do:
SELECT table_schema,table_name, create_time FROM views where table_schema='my_view_schema' and table_name='my_view_name';
To check when a specific view was last accessed try this:
select * from QUERY_REQUESTS where REQUEST ilike '%YourViewName%' and REQUEST not ilike '%from QUERY_REQUESTS%' order by END_TIMESTAMP DESC limit 1;
Please note that after you analyze_statistics on a table, the table no longer appear as unused.
To avoid this, query dc_projection_used and run analyze_statistics only on used tables..
E.g. select distinct'select analyze_statistics('''||table_schema||'.'||table_name||''');' from dc_projections_used where is_virtual='false';
Thank you for detailed explanation.
Great info, thanks, very useful.
You can check actual time data goes back in data collector table. There is a column first_time in data_collector, it is a timestamp of oldest record in dc table on node. Data collector tables are purged independently on nodes, you need to take max(first_time) to find oldest timestamp when you have all info.
Quick check on my busy cluster show that data in dc_projections_used goes back 5 days.
Same 5 days data goes back in dc_requests_issued, though I see missing queries that are older than 5 hours.
Of note, you can adjust the retention policy on the data collector tables for size and/or time, so if your system is very busy and data rolls off too fast you can increase the retention to try to keep some desired amount. See the docs "Configuring Data Retention Policies" page for details on the get and set functions related to retention. If you choose to increase size or time be mindful that the files on disk for that data collector table could get large so it's good to monitor them for a while to make sure they don't bloat the Vertica footprint. They are located in the catalog dir on each node. There may be between 1 and 4 log and json files total on each host for each data collector table depending on amount of data currently retained.
You can see here my ProjectionsUsed only currently has 1 file as this is a new database with minimal queries running. But the ProcessInfoBySecond is tracking constantly so it has the 4 log files and 4 corresponding json files.
-rw-------. 1 dbadmin verticadba 6552587 Dec 21 10:09 ProcessInfoBySecond_20221221T142929.001106Z_VMart_v12.0.0-0_v_vmart_node0001.log
-rw-------. 1 dbadmin verticadba 85 Dec 21 10:09 ProcessInfoBySecond_20221221T142929.001106Z_VMart_v12.0.0-0_v_vmart_node0001.log.json
-rw-------. 1 dbadmin verticadba 6552546 Dec 21 10:49 ProcessInfoBySecond_20221221T150921.001170Z_VMart_v12.0.0-0_v_vmart_node0001.log
-rw-------. 1 dbadmin verticadba 85 Dec 21 10:49 ProcessInfoBySecond_20221221T150921.001170Z_VMart_v12.0.0-0_v_vmart_node0001.log.json
-rw-------. 1 dbadmin verticadba 6552548 Dec 21 11:29 ProcessInfoBySecond_20221221T154913.001222Z_VMart_v12.0.0-0_v_vmart_node0001.log
-rw-------. 1 dbadmin verticadba 85 Dec 21 11:29 ProcessInfoBySecond_20221221T154913.001222Z_VMart_v12.0.0-0_v_vmart_node0001.log.json
-rw-------. 1 dbadmin verticadba 3961131 Dec 21 11:53 ProcessInfoBySecond_20221221T162905.001535Z_VMart_v12.0.0-0_v_vmart_node0001.log
-rw-------. 1 dbadmin verticadba 17913229 Dec 20 11:57 ProjectionsUsed_20221123T132838.993317Z_VMart_v12.0.0-0_v_vmart_node0001.log
Note in the aboer file listing the 3 oldest ProcessInfoBySecond files are all the same size, and the 4th is smaller. Vertica creates the first and when it reaches the size or time limit it creates a new one. It continues this until there are 4 total. Then when the last one reaches the policy limit the oldest is rolled off and a new one is created. So it's a constantly rotating set of files. In the above you can see that there rotate out about every 40 minutes. Different tables will fill and rotate at different intervals based on db activity and what's getting tracked in the table.
I hope it helps.
On unrelated note... yes data collector log files works exactly as you described.
It makes a lot of sense to compress old (i.e. non-currently written) log files with zstd.
I just tried to compress one ProjectionUsed log file with zstd, it compressed to 3.5% of original.
That will be a huge disk space saver. In 3G allocated for data collector, it will fit 30 times more data.
Please consider filing internal new feature request, compress with zstd old data collector log files.