Database Space

Hi,
as in vertica documentation , it has mentioned -
HP Vertica excludes,
  • Multiple projections (underlying physical copies) of data from a logical database entity (table). Data appearing in multiple projections of the same table is counted only once.
but what if anyone create a projection using some kind of join clause.
will HP Vertica excludes this new projection? 

Comments

  • Are you talking bout the occupied space in your file storage system or the Vertica License Utilization(row data) ?

    When you are talking about the space used by the projection created in you database, is good to know they will occupy (extra - beside the actual super projection)space.
    Here is a demo example:

    dbadmin=> \dt
                   List of tables
     Schema | Name  | Kind  |  Owner  | Comment
    --------+-------+-------+---------+---------
     public | btest | table | dbadmin |
    (1 row)
    dbadmin=> \dj
                   List of projections
     Schema |    Name     |  Owner  | Node | Comment
    --------+-------------+---------+------+---------
     public | btest_super | dbadmin |      |
    (1 row)
    dbadmin=> \d btest;
                                       List of Fields by Tables
     Schema | Table | Column |    Type     | Size | Default | Not Null | Primary Key | Foreign Key
    --------+-------+--------+-------------+------+---------+----------+-------------+-------------
     public | btest | keys   | varchar(10) |   10 |         | f        | f           |
     public | btest | values | varchar(10) |   10 |         | f        | f           |
    (2 rows)
    dbadmin=> select sum(USED_BYTES) 
            from STORAGE_CONTAINERS 
    where PROJECTION_NAME like 'btest%';
     sum
    -----
     114
    (1 row)
    dbadmin=> create projection btest_values as select values from btest;
    CREATE PROJECTION
    dbadmin=> \dj
                         List of projections
     Schema |     Name     |  Owner  |      Node       | Comment
    --------+--------------+---------+-----------------+---------
     public | btest_super  | dbadmin |                 |
     public | btest_values | dbadmin | v_test_node0001 |
    (2 rows)
    dbadmin=> select refresh('');
    dbadmin=> \dt
                   List of tables
     Schema | Name  | Kind  |  Owner  | Comment
    --------+-------+-------+---------+---------
     public | btest | table | dbadmin |
    (1 row)
    dbadmin=> \dj
                         List of projections
     Schema |     Name     |  Owner  |      Node       | Comment
    --------+--------------+---------+-----------------+---------
     public | btest_super  | dbadmin |                 |
     public | btest_values | dbadmin | v_test_node0001 |
    (2 rows)
    dbadmin=> select sum(USED_BYTES),PROJECTION_NAME 
             from STORAGE_CONTAINERS
             where PROJECTION_NAME like 'btest%' group by 2;
     sum | PROJECTION_NAME
    -----+-----------------
     114 | btest_super
      78 | btest_values
    (2 rows)
    Now when talking about the Raw Vertica License - the data once dumped into Vertica it is calculated(to the license total) once. Once inside the database you can create as many projecitons(buddy of segmented) as you.
    Also the compression(benefits/gains) is not applied to the raw license data calculation. 

  • Thanks, for your response.
    This one is fine. 
    But i am talking about something like,
    create table table1(x varchar primary key not null,y varchar).
    create table table2(x varchar primary key not null,y varchar)
    create projection  table1_proj_1 as select x from table1;
    create projection table1_proj_2 as select y from table1;

    i know HP Vertica excludes, these projections as well as buddy projections while auditing database size.

    but if i do 
    create projection proj1_2 as select a.x,b.y from table1 a join table2 b on (a.x=b.x);

    will it be incliuded as raw data size ?

    Thanks,

  • Navin_CNavin_C Vertica Customer
    Hi Naveen,

    The projection proj1_2 size will not be counted in raw data size.

    As in Vertica Raw data size means whatever data you have loaded/dumped into vertica.

    Creating a projections manually will not increase the raw data size, because it is created internally by you and not loaded/dumped, but it will increase the compressed data size of database.

    The used_bytes column in the projection_storage table shows the compressed data size of the projections.

    Hope this helps
  • Acurate answer by Navin.
    No License data will be used by internal copy of the data.
    Even because Vertica will still be using less space internally then your loaded RAW file size. Of course this depends of the data type and the used encoding
    Take this scenario :
    You have a raw file of 728M(25 millions rows) holding a table called personal_data with columns(name,gender,country).
     After you load this one on you db the file size will be 90% smaller because of you RLE encoding applied on you gender and country fields.
    So now you have a table that has 0,2M
    Se example:
    \set :raw_file_size 'du -b person | awk {'print $1'}' select :raw_file_size as "File Raw Size",        used_bytes as "Projection Space Usage",        used_bytes / :raw_file_size * 100 as Porcent   from STORAGE_CONTAINERS; File Raw Size | Projection Space Usage |       Porcent ---------------+------------------------+----------------------      762513950 |                 217408 | 0.028512002960732700 (1 row) 
     See RLE encoding gains (is like 99% saving in space)
    as 
    dbadmin=> select count(distinct(country)),count(distinct(gender)),count(distinct(name)),count(*) from person;   count | count | count |  count
    -------+-------+-------+----------
       204 |     2 |  6586 | 25607196


Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file