Options

Vertica Column Statistics : Profiling Data

Hi,

I need to fetch -

1- Top 100 Distinct Values for Each Column (Including Min & Max) .

 

I'm interested in findng out - as a result of Analyze statistics/ Histogram does vertica stored such information any where? If yes then which system table/view?

 

If such information isn't stored by Default by vertica is there any less tedius way to write such queries to collect statitics/ data profiling on columns ?

Comments

  • Options

    Hi ,

    You basically have  two main approch to get this info 

    1. Query vs_projection_column_histogram table .
    2. Using EXPORT_STATISTICS API  , see https://my.vertica.com/docs/7.1.x/HTML/Content/Authoring/SQLReferenceManual/Functions/VerticaFunctions/EXPORT_STATISTICS.htm

    Thanks 

  • Options

    Thanks. Output of Export Statistics has information I need. 

     

    I had created queries using this sql and stored them in File. And Execute these queries to Store data in a Table with  Another 'INSERT INTO XXXX ' appended around .. 

     

    Another QQ, Which Table information does EXPORT_STATISTICS function might be querying at.?

     

    Column_Storage doesn't seem to store MAX, MIN values.  ! And other solution you spoke about- Querying some Histogram storage table.. I couldn't find the table.. can you please write exact Name .. Thanks

     

    SELECT DISTINCT '
    || ''''||TABLE_NAME||''''||' AS TABLE_NAME ,'
    ||''''||COLUMN_NAME||''''||' AS COLUMN_NAME ,'
    ||COLUMN_NAME||' AS COLUMN_VAL ,
    MIN('||COLUMN_NAME||') OVER () AS MIN_VAL,
    MAX('||COLUMN_NAME||
    ') OVER () AS MAX_VAL
    FROM
    CHIM_BDI.'||TABLE_NAME||'
    ORDER BY '||COLUMN_NAME||
    ' DESC LIMIT 10 ) ;
    COMMIT ; '
    FROM
    V_CATALOG.COLUMNS
    WHERE
    TABLE_SCHEMA='xxxxx'
    AND TABLE_NAME LIKE 'xxxx'
    AND COLUMN_NAME NOT LIKE 'xxx%'
    AND COLUMN_NAME <>'xxx' AND DATA_TYPE NOT LIKE 'long varchar%' ;

  • Options

    Hi ,

    The internal implementation of EXPORT_STATISTICS is not publish .  

    However Vertica  probably querying the table i send you , in the previous replay   (you need dbadmin previlage in order to access its content )  .

    For Min & MAx , Vertica maintain it differently from what you probably know that exists on other DB products , in Vertica ( like Casandra )  the Min Max is managed on the level of the  ROS file  , to  see this info  (still you need dbadmin previlage in order to access its content)  you can query     select * from vs_ros_min_max_values

     

    I hope this answers all your questions

     

    Thanks

     

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file