The Vertica Forum recently got a makeover! Let us know what you think by filling out this short, anonymous survey.
Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!
Vertica Column Statistics : Profiling Data
Hi,
I need to fetch -
1- Top 100 Distinct Values for Each Column (Including Min & Max) .
I'm interested in findng out - as a result of Analyze statistics/ Histogram does vertica stored such information any where? If yes then which system table/view?
If such information isn't stored by Default by vertica is there any less tedius way to write such queries to collect statitics/ data profiling on columns ?
0
Comments
Hi ,
You basically have two main approch to get this info
Thanks
Thanks. Output of Export Statistics has information I need.
I had created queries using this sql and stored them in File. And Execute these queries to Store data in a Table with Another 'INSERT INTO XXXX ' appended around ..
Another QQ, Which Table information does EXPORT_STATISTICS function might be querying at.?
Column_Storage doesn't seem to store MAX, MIN values. ! And other solution you spoke about- Querying some Histogram storage table.. I couldn't find the table.. can you please write exact Name .. Thanks
SELECT DISTINCT '
|| ''''||TABLE_NAME||''''||' AS TABLE_NAME ,'
||''''||COLUMN_NAME||''''||' AS COLUMN_NAME ,'
||COLUMN_NAME||' AS COLUMN_VAL ,
MIN('||COLUMN_NAME||') OVER () AS MIN_VAL,
MAX('||COLUMN_NAME||
') OVER () AS MAX_VAL
FROM
CHIM_BDI.'||TABLE_NAME||'
ORDER BY '||COLUMN_NAME||
' DESC LIMIT 10 ) ;
COMMIT ; '
FROM
V_CATALOG.COLUMNS
WHERE
TABLE_SCHEMA='xxxxx'
AND TABLE_NAME LIKE 'xxxx'
AND COLUMN_NAME NOT LIKE 'xxx%'
AND COLUMN_NAME <>'xxx' AND DATA_TYPE NOT LIKE 'long varchar%' ;
Hi ,
The internal implementation of EXPORT_STATISTICS is not publish .
However Vertica probably querying the table i send you , in the previous replay (you need dbadmin previlage in order to access its content ) .
For Min & MAx , Vertica maintain it differently from what you probably know that exists on other DB products , in Vertica ( like Casandra ) the Min Max is managed on the level of the ROS file , to see this info (still you need dbadmin previlage in order to access its content) you can query select * from vs_ros_min_max_values
I hope this answers all your questions
Thanks