Can Vertica Solve Parts and Assemblies?

How do I return the parts of a parent record?
Given I want to start at the root of Thread=1.
ThisDate ChildDate Thread 1/1/2011 2/5/2013 1 1/15/2011 3/1/2013 2 1/16/2011 3/1/2013 3 2/5/2011 4/5/2013 4 3/1/2011 4/5/2013 5 4/5/2011 null 6
In this case, the expected rows are 1/1/2011, 2/5/2011 and 4/5/2011.

Comments

  • Navin_CNavin_C Vertica Customer
    Is this what you are looking fro


    select firstvalue(Thisdate)
    over (partition by ChildDate order by ChildDate) as "parent_record"
    from Table

    Let me know, if this works

  • No because it returns 1/15/11. It should not be returned because it's not part of the chain starting from thread = 1. The chain goes 1/1/2011 --> 2/5/2011 --> 4/5/2011. I think recursion is needed to solve this problem.

    Thanks for responding.
    with mydata(thisdate, childdate, thread) as ( 
      SELECT '1/1/2011'::date,   '2/5/2013'::date,  1
      UNION
      SELECT '1/15/2011'::date,  '3/1/2013'::date,  2
      UNION
      SELECT '1/16/2011'::date,  '3/1/2013'::date,  3
      UNION
      SELECT '2/5/2011'::date,   '4/5/2013'::date,  4
      UNION
      SELECT '3/1/2011'::date,   '4/5/2013'::date,  5
      UNION
      SELECT '4/5/2011'::date,   null::date,      6
    )
    select 
        first_value(Thisdate)
    over (partition by ChildDate order by ChildDate) as "parent_record"
    from mydata
    ;


  • Hi Laelsan,

    Could you clarify what you are trying to do here?  How are you identifying a chain from this data?  I'm not seeing what is special about exactly those three dates.  (1/15/2011 has a different ChildDate than 1/1/2011, for example -- what additional factor causes it to be part of the 1/1/2011 record?

    Navin's approach seems like a good start to me.  Our LEAD() and LAG() analytic functions, as well as the more-powerful MATCH operator, can help in these cases.  (MATCH can provide behavior similar to recursion.)  But I don't yet understand what you're doing so I can't give a specific example.

    Thanks,
    Adam
  • I'm trying to find all dates that belong to a specified starting thread. In this case, I want to start at thread=1 and get all child dates starting with that thread. It's like parts and assemblies, with a parentID/childID.

    The first date we find is 1/1/2011, this date has a child date of 2/5/2013. We then find its child date, 4/5/2011. And then we find its child date which is null so the chain is terminated.

    Navin's approach doesn't solve the problem because it gets information outside of the chain. LEAD and LAG don't work because you have need to look forward/backward more than one row. I have no idea what MATCH does, i'll have to look into it.
  • I totally just realized I messed up the data. I am so sorry. Fixing in one sec.
  • All dates should be 2011, not 2013.

    with mydata(thisdate, childdate, thread) as (
      SELECT '1/1/2011'::date,   '2/5/2011'::date,  1
      UNION
      SELECT '1/15/2011'::date,  '3/1/2011'::date,  2
      UNION
      SELECT '1/16/2011'::date,  '3/1/2011'::date,  3
      UNION
      SELECT '2/5/2011'::date,   '4/5/2011'::date,  4
      UNION
      SELECT '3/1/2011'::date,   '4/5/2011'::date,  5
      UNION
      SELECT '4/5/2011'::date,   null::date,      6
    )
    select 
        first_value(Thisdate)
    over (partition by ChildDate order by ChildDate) as "parent_record"
    from mydata
    ;
  • Specifically, I'm only trying to find the dates that are more than 6 months starting with the 1st date. Take the 1st date, go out 30 days and then find the first, next date. This date starts a new 30 day window. Rinse and repeat.
  • Ah, yeah, that update definitely helps.  Thanks!

    So, to re-phrase, you want to order by Thread; then consider the the value of ChildDate in the first row; then select from that row until the same (or a larger?) value appears in ThisDate?  Ignoring runs of rows with contiguous ChildDate values.

    Here is maybe part of an ugly solution:

    select mydata_inner.*, count(mydata_outer.thisdate) from
    (select childdate, min(thisdate) as thisdate, min(thread) as thread from mydata group by childdate) as mydata_inner
    left outer join mydata as mydata_outer on mydata_outer.childdate <= mydata_inner.thisdate
    group by mydata_inner.thisdate, mydata_inner.childdate, mydata_inner.thread;

    This gives you a fourth column "count" that increases every time you get a new "part".  You can then use LAG() and filter by rows where count != lag(count):

    select thisdate from (
    select *, lag(count) over (order by thread) as lagged_count from (
    select mydata_inner.*, count(mydata_outer.thisdate) from
    (select childdate, min(thisdate) as thisdate, min(thread) as thread from mydata group by childdate) as mydata_inner
    left outer join mydata as mydata_outer on mydata_outer.childdate <= mydata_inner.thisdate
    group by mydata_inner.thisdate, mydata_inner.childdate, mydata_inner.thread
    ) t1 ) t2
    where lagged_count != count or lagged_count is null;

    But that's some really hairy SQL.  And it is going to be slow.  Computing all of these intermediate values is much more work than you really want to be doing.  Maybe there are some SQL wizards around here who can help optimize it.  I'm sure that, if you run our database optimizer, it will give you projections that will help with this at least a little (if you don't already have them).

    The problem from our perspective is that there's a slightly tricky correlation between these columns; not one that's easy to tell SQL about so our engine will have a hard time making the assumptions that you want it to make in order to run this query fast.

    In situations like these, one good approach is to write a User-Defined Transform function (C++ or Java, not SQL) using our SDK.  Then you can bake those assumptions into your code -- have the function ask for the data sorted by Thread, then just keep track of first Thread and Child values as simple local variables, and write a few "if" statements for each row to encode the necessary logic.

    MATCH might help here somewhat, but to solve the problem completely/generally, I think it would need to support match targets (ie., "this row has the same value as that previous row, whatever that value may have been"), and I believe we don't currently support that.

    Let us know if any of that helps,

    Adam

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file