Can Vertica Solve Parts and Assemblies?
How do I return the parts of a parent record?In this case, the expected rows are 1/1/2011, 2/5/2011 and 4/5/2011.
Given I want to start at the root of Thread=1.
ThisDate ChildDate Thread 1/1/2011 2/5/2013 1 1/15/2011 3/1/2013 2 1/16/2011 3/1/2013 3 2/5/2011 4/5/2013 4 3/1/2011 4/5/2013 5 4/5/2011 null 6
0
Comments
select firstvalue(Thisdate)
over (partition by ChildDate order by ChildDate) as "parent_record"
from Table
Let me know, if this works
Thanks for responding.
Could you clarify what you are trying to do here? How are you identifying a chain from this data? I'm not seeing what is special about exactly those three dates. (1/15/2011 has a different ChildDate than 1/1/2011, for example -- what additional factor causes it to be part of the 1/1/2011 record?
Navin's approach seems like a good start to me. Our LEAD() and LAG() analytic functions, as well as the more-powerful MATCH operator, can help in these cases. (MATCH can provide behavior similar to recursion.) But I don't yet understand what you're doing so I can't give a specific example.
Thanks,
Adam
The first date we find is 1/1/2011, this date has a child date of 2/5/2013. We then find its child date, 4/5/2011. And then we find its child date which is null so the chain is terminated.
Navin's approach doesn't solve the problem because it gets information outside of the chain. LEAD and LAG don't work because you have need to look forward/backward more than one row. I have no idea what MATCH does, i'll have to look into it.
So, to re-phrase, you want to order by Thread; then consider the the value of ChildDate in the first row; then select from that row until the same (or a larger?) value appears in ThisDate? Ignoring runs of rows with contiguous ChildDate values.
Here is maybe part of an ugly solution:
select mydata_inner.*, count(mydata_outer.thisdate) from
(select childdate, min(thisdate) as thisdate, min(thread) as thread from mydata group by childdate) as mydata_inner
left outer join mydata as mydata_outer on mydata_outer.childdate <= mydata_inner.thisdate
group by mydata_inner.thisdate, mydata_inner.childdate, mydata_inner.thread;
This gives you a fourth column "count" that increases every time you get a new "part". You can then use LAG() and filter by rows where count != lag(count):
select thisdate from (
select *, lag(count) over (order by thread) as lagged_count from (
select mydata_inner.*, count(mydata_outer.thisdate) from
(select childdate, min(thisdate) as thisdate, min(thread) as thread from mydata group by childdate) as mydata_inner
left outer join mydata as mydata_outer on mydata_outer.childdate <= mydata_inner.thisdate
group by mydata_inner.thisdate, mydata_inner.childdate, mydata_inner.thread
) t1 ) t2
where lagged_count != count or lagged_count is null;
But that's some really hairy SQL. And it is going to be slow. Computing all of these intermediate values is much more work than you really want to be doing. Maybe there are some SQL wizards around here who can help optimize it. I'm sure that, if you run our database optimizer, it will give you projections that will help with this at least a little (if you don't already have them).
The problem from our perspective is that there's a slightly tricky correlation between these columns; not one that's easy to tell SQL about so our engine will have a hard time making the assumptions that you want it to make in order to run this query fast.
In situations like these, one good approach is to write a User-Defined Transform function (C++ or Java, not SQL) using our SDK. Then you can bake those assumptions into your code -- have the function ask for the data sorted by Thread, then just keep track of first Thread and Child values as simple local variables, and write a few "if" statements for each row to encode the necessary logic.
MATCH might help here somewhat, but to solve the problem completely/generally, I think it would need to support match targets (ie., "this row has the same value as that previous row, whatever that value may have been"), and I believe we don't currently support that.
Let us know if any of that helps,
Adam