Hi guys,

I've been having a bit of trouble here. My table consists of three columns - key1, key2 and count. What I would like to do is check if an entry exists for the combination of key1=some_value and key2=some_other_value. If it does, then I would like to increment the counter. If it doesn't, then I would like to insert a row for that pair of keys.

The closest I've come to doing this is by basically using this approach in what seems like a weird way. http://www.vertica.com/2012/08/27/load-new-data-and-modify-data-simultaneously/

This is what I've been doing:
- Insert a row into a temporary table
- Merge that temp table into the real table
- Commit the changes
- Truncate the temp table
- Repeat with the next entry

This was working but didn't seem very efficient as I was doing everything one entry at a time. I tried inserting a bunch of entries into the temp table, then merging them all at once, but instead of the counters going up as expected, I would instead get duplicate rows. I was hoping someone here could help me out.



  • Options
    #for example
    create table TBL(k1 int, k2 int, ...);

    # order will increase performance for group by counting
    create projection TBL_UPDATE (...)... order by k1, k2...;

    # copy data as is and count matched entries
    create view TBL_UP as select k1, k2, count(*) from TBL where (k1,k2) = (v1,v2) group by k1, k2;
    create view TBL_UP as select k1, k2, sum(((k1,k2) = (v1,v2))::int) from TBL group by k1, k2;

    #or you can play with it Vertica specific functions

    Also you can try:
    * create table TBL_UPDATED like TBL including projections
    * populate TBL_UPDATED with insert/select...(like view from above) >> swap tables >> drop "old"
    In any case COPY data as is without UPDATED/DELETES/MERGES will be faster.

    If projection is "suitable" for query/view, sometimes to create an updates data VIEW much more faster than to update it. If key1 and key2 have low cardinality  and projection is optimized for their counting then probably VIEW will be a fastest way.
  • Options
    Hi Daniel,

    This worked perfectly, thanks. When I created my view, I used "from TBL_UPDATE" rather than "from TBL", because it didn't seem that the projection was being used with the way you had written it. I imagine that was just a typo, but maybe not?

    Anyways, thanks again.
  • Options
    May be optimizer doesn't understand my syntax, try standard:
    ... where key1 = value1 and key2 = value
    Also you can define a projection as source too, ie:
    # avd and dis: historical data + no updates, but takes a storage
    create schema rawdata;

    # create a raw data table
    create table rawdata.fact (row_id int, sid int, pid int, oid int, cid int, income money);

    # create suitable projection
    create projection rawdata.prj_fact_update_optimized (grouped(oid encoding rle, cid encoding rle)) as select * from rawdata.fact order by oid, cid;

    # create public VIEW as select directly from projection
    create view public.fact as select oid, cid, count(*) as "OID,CID", income from rawdata.prj_fact_super_update_optimized group by oid, cid;

    Some think like this.
    Optimizer choose a plan by COST and not by time execution. May be its a reason for choosing other projection.
    Also previously I provided a very simple example, here its little more complicated - explicit grouping of columns, encoding definition. Other columns also require encoding. Play with it.
    ... grouped(oid encoding rle, cid encoding rle) ...
  • Options
    Thanks again Daniel!

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file