Default segmentation behavior differs in Create Table and Create Projection?

Hi - I am trying to understand how Vertica handles "segmentation" option if I don't specify one. The problem I have is - our application generates DDLs on the fly (programatically). I am trying to find out what is a good policy to implement for medium and small workloads. By policy I mean whether our Application specifies how to segment on what column to what nodes, vs. leave everything to Vertica server.

What I found out so far seems to suggest that -
1. Without a "segmented by" or "unsegmented" clause, the create TABLE statement will create a Segmented superprojection on all nodes.
2. However, if I create a similar projection (e.g. select * from table_x order by) without the segmentation option, the create PROJECTION statement will create a non-segmented on all nodes.

Is this the expected behavior? I did read both the Admin and SQLRef portion but didn't find an answer. A colleague read somewhere that if we don't specify the segmentation option, Vertica may add one if the table is too large. Is that so?

Thanks,

Bing




Comments

  • The default was switched from version 5.x to 6.x to be segmented. In 5.x, the default was unsegmented (i.e., replication), either default is problematic for various reasons, and both should be managed.

    You can supply the segmentation clause in the create table statement if you want.
    https://my.vertica.com/docs/6.1.x/HTML/index.htm#1293.htm

    You can find the following information on default segmentation strategies, here:
    https://my.vertica.com/docs/6.1.x/HTML/index.htm#1293.htm
    (scroll down to "Characteristics of default automatic projections")

    Bottom line is - it's rarely a good idea to allow the defaults define your physical schema. It's fine in isolation, but tables that are joined together, that have differing segmentation values will not join as efficiently as they could if they were the same.  So it's definitely worth taking the time to define a cohesive strategy up front.

  • about the default  charater of automatic projection in Version6.1.X :
    >It is a superprojection.
    >It uses the default encoding-type AUTO.
    >If created as a result of a CREATE TABLE AS SELECT statement, uses the encoding specified in the query table.
    >Auto-projections use hash segmentation.
    ......

    two questions:
    1 if anyone can change the default charater? for example, change the default automatic projection to unsegmented. if have, how to change it?

    2 in the future version, if Vertica will stick to these default charaters


  • Hi!
    change the default automatic projection to unsegmented. if have, how to change it?
    Partial answer:

    For DBD:
    you can run "select disable_elastic_cluster()". In this case DBD will create only a replicated projections (v7, about v6 - I didn't checked)


    For common usage:
    General Parameters >> MaxAutoSegColumns
    Specifies the number of columns (0 - 1024) to segment automatically when creating auto-projections from COPY and INSERT INTO statements. Setting this parameter to zero (0) indicates to use all columns in the hash segmentation expression.
    daniel=> select get_config_parameter('MaxAutoSegColumns');
    get_config_parameter
    ----------------------
    32
    (1 row)

    daniel=> select set_config_parameter('MaxAutoSegColumns', 1);
    set_config_parameter
    ----------------------------
    Parameter set successfully
    (1 row)
  • thank you Daniel. Do you mean if disable the elastic_cluster, then create a table and insert records to this new tables, the default supper projection of this table would be  a replicated one
  • Hi Vincent!

    No, I mean if elastic cluster is disabled, so DBD will create only replicated projection, but its DBD only. This is a workaround to create MODULARHASH segmentation (don't do a deploy >> edit deploy script >> add disable in the beginning and enable in the end).

    Manually everything works correct.


    Thanks to Harshad Dhavale for info:
    When Elastic Cluster (EC) is enabled, (which it seems to be by default), modularhash is replaced with hash (because modularhash doesn't work with elastic cluster projection segmentation). There is an internal bug request open for this and to document this behavior in the docs. 
    I did testing to verify this behavior: 

    Before disabling elastic cluster: 

    create table targ1 (a int); 
    create projection targ1_p1 as select * from targ1 segmented by modularhash(a) all nodes; 
    create projection targ1_p2 as select * from targ1 segmented by modularhash(a) all nodes offset 1; 
    SELECT EXPORT_OBJECTS('','public.targ1',false); 

    RESULT: 

    CREATE PROJECTION public.targ1_p1 -----------> SEGMENTED BY hash(targ1.a) ALL NODES ; 
    CREATE PROJECTION public.targ1_p2 -----------> SEGMENTED BY hash(targ1.a) ALL NODES OFFSET 1; 


    Disabled Elastic Cluster:

    select disable_elastic_cluster(); 


    After disabling elastic cluster: 

    create table targ2 (a int); 
    create projection targ2_p1 as select * from targ2 segmented by modularhash(a) all nodes; 
    create projection targ2_p2 as select * from targ2 segmented by modularhash(a) all nodes offset 1; 
    SELECT EXPORT_OBJECTS('','public.targ2',false); 

    RESULT: 

    CREATE PROJECTION public.targ2_p1 -----------> SEGMENTED BY modularhash(targ2.a) ALL NODES ; 
    CREATE PROJECTION public.targ2_p2 -----------> SEGMENTED BY modularhash(targ2.a) ALL NODES OFFSET 1; 



    So Elastic Cluster needs to be disabled using "select disable_elastic_cluster();" in order for Segmentation to take Modular Hash.
  • Daniel,  thanks a lot

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file