statistics types

We have a process that maps data from one table to another and it fails when the statistics on the source table drop from FULL to ROWCOUNT. We can monitor this with selects on projection_columns. It seems when the data has changed enough by inserts/deletes that the stats become invalid and the db defaults them down to ROWCOUNT. When it does this the select on the source table in the mapper fails. Is there any documentation on what the difference is between FULL and ROWCOUNT? Has anyone else had this issue and are then any suggestions as what we can do maybe with hints on the select statement so it will work with just ROWCOUNT stats? The select joins between 4 other tables where the on conditions all use simple id = id conditions. When it fails it returns the following
Join inner did not fit in memory [(v_temp_schema.prequery x PROPBM.claimwarehouse) using prequery_b0 and claimwarehouse_new2_b0 (PATH ID: 9)].
We could enable JOIN_SPILL but that seems like a band-aid and not really solving the problem when proper statistics make it work.
As I write this I am thinking I could do an explain/profile when it inserts properly and see if I can give hints that mimic the good query plan. Your thoughts would be appreciated.
We are running Vertica 9.0

Tagged:

Answers

  • moshegmosheg Vertica Employee Administrator

    1) The type of statistics a column can contain:
    NONE: No statistics
    ROWCOUNT: Set by ANALYZE_ROW_COUNT
    FULL: Set by running ANALYZE_STATISTICS
    ANALYZE_ROW_COUNT is a lightweight operation that collects a minimal set of statistics and aggregate row counts for a projection, and saves it in the database catalog. In many cases, this data satisfies many optimizer requirements for producing optimal query plans.
    You can explicitly invoke ANALYZE_ROW_COUNT through calls to DO_TM_TASK. For example:
    SELECT DO_TM_TASK('analyze_row_count', 'store_orders_fact_b0');
    For more info see : https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AdministratorsGuide/Statistics/AnalyzeRowCounts.htm
    2) To reduce catalog size and shorten the time it takes to run ANALYZE_STATISTICS , narrow the scope of analysis to a subset of a table's columns which participate in your query predicate, and if relevant consider ANALYZE_STATISTICS_PARTITION use.
    3) Another option is Flattened Tables. One can create wide tables that combine all fact and dimension table columns to reduce the need to write joins and speed up query execution. See: https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AnalyzingData/FlattenedTables/FlattenedTables.htm
    4) To save query plans for possible reuse, consider directed queries. There is also an option to export the queries plan as directed queries to an external SQL file, to reduce catalog overhead. For more info about this option see: https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AdministratorsGuide/QueryManagement/DirectedQueries/BatchExportImportTools.htm

  • emorenoemoreno Employee

    Hi,
    As Moshe mentioned, the FULL statistics is when you do analyze_statistics or analyze_histogram, however, ROWCOUNT is a service that runs by itself depends on the interval of AnalyzeRowCountInterval configuration parameter. There is no need for you to run manually.
    Statistics are used by the Optimizer when plan a query, it should be run in columns that you apply predicated, joins and groups by, it is not necessary to apply in metrics.
    Hope this helps,
    Eugenia

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file