We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Vertica Column Statistics : Profiling Data — Vertica Forum

Vertica Column Statistics : Profiling Data

Hi,

I need to fetch -

1- Top 100 Distinct Values for Each Column (Including Min & Max) .

 

I'm interested in findng out - as a result of Analyze statistics/ Histogram does vertica stored such information any where? If yes then which system table/view?

 

If such information isn't stored by Default by vertica is there any less tedius way to write such queries to collect statitics/ data profiling on columns ?

Comments

  • Hi ,

    You basically have  two main approch to get this info 

    1. Query vs_projection_column_histogram table .
    2. Using EXPORT_STATISTICS API  , see https://my.vertica.com/docs/7.1.x/HTML/Content/Authoring/SQLReferenceManual/Functions/VerticaFunctions/EXPORT_STATISTICS.htm

    Thanks 

  • Thanks. Output of Export Statistics has information I need. 

     

    I had created queries using this sql and stored them in File. And Execute these queries to Store data in a Table with  Another 'INSERT INTO XXXX ' appended around .. 

     

    Another QQ, Which Table information does EXPORT_STATISTICS function might be querying at.?

     

    Column_Storage doesn't seem to store MAX, MIN values.  ! And other solution you spoke about- Querying some Histogram storage table.. I couldn't find the table.. can you please write exact Name .. Thanks

     

    SELECT DISTINCT '
    || ''''||TABLE_NAME||''''||' AS TABLE_NAME ,'
    ||''''||COLUMN_NAME||''''||' AS COLUMN_NAME ,'
    ||COLUMN_NAME||' AS COLUMN_VAL ,
    MIN('||COLUMN_NAME||') OVER () AS MIN_VAL,
    MAX('||COLUMN_NAME||
    ') OVER () AS MAX_VAL
    FROM
    CHIM_BDI.'||TABLE_NAME||'
    ORDER BY '||COLUMN_NAME||
    ' DESC LIMIT 10 ) ;
    COMMIT ; '
    FROM
    V_CATALOG.COLUMNS
    WHERE
    TABLE_SCHEMA='xxxxx'
    AND TABLE_NAME LIKE 'xxxx'
    AND COLUMN_NAME NOT LIKE 'xxx%'
    AND COLUMN_NAME <>'xxx' AND DATA_TYPE NOT LIKE 'long varchar%' ;

  • Hi ,

    The internal implementation of EXPORT_STATISTICS is not publish .  

    However Vertica  probably querying the table i send you , in the previous replay   (you need dbadmin previlage in order to access its content )  .

    For Min & MAx , Vertica maintain it differently from what you probably know that exists on other DB products , in Vertica ( like Casandra )  the Min Max is managed on the level of the  ROS file  , to  see this info  (still you need dbadmin previlage in order to access its content)  you can query     select * from vs_ros_min_max_values

     

    I hope this answers all your questions

     

    Thanks

     

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file