How do I find the 3rd occurrence in a column and get the corresponding data?
I have a lookup table called networkdays, here is a sample of the data:
Date Vendor DayName DateFlag
2020-01-27 x Monday N
2020-01-28 x Tuesday Y
2020-01-29 x Wednesday Y
2020-01-30 x Thursday Y
2020-01-31 x Friday Y
2020-02-01 x Saturday N
2020-02-02 x Sunday N
2020-02-03 x Monday Y
2020-02-04 x Tuesday Y
2020-02-05 x Wednesday Y
2020-02-06 x Thursday Y
2020-02-07 x Friday Y
2020-02-08 x Saturday N
2020-02-09 x Sunday N
2020-02-10 x Monday Y
2020-02-11 x Tuesday Y
2020-02-12 x Wednesday Y
2020-02-13 x Thursday Y
2020-02-14 x Friday Y
2020-02-15 x Saturday N
2020-02-16 x Sunday N
There are multiple vendors with different DateFlag markers. The DateFlag indicates a day that particular vendor is working.
I have another table that has data about orders that have been placed and the networkdays table is there to give me a due date for when the order needs to be completed. Typically the vendors are allowed 3 working days to produce the order, so essentially I need to count 3 occurrences of DateFlag = 'Y' (not including date received) and then input the corresponding Date into this other set of data.
So an example would be
OrderNum InvoicedDate Vendor DueDate(3rd occurrence of DateFlag ='Y' from networkdays table)
1 2020-01-27 x 2020-01-30
2 2020-01-28 x 2020-01-31
3 2020-01-29 x 2020-02-03
4 2020-01-30 x 2020-02-04
5 2020-01-31 x 2020-02-05
6 2020-02-01 x 2020-02-05
7 2020-02-02 x 2020-02-05
So being able to find the 3rd occurrence of 'Y' would help solve problems like this below:
Date Vendor DayName DateFlag
2019-12-21 x Saturday N
2019-12-22 x Sunday N
2019-12-23 x Monday Y
2019-12-24 x Tuesday N
2019-12-25 x Wednesday N
2019-12-26 x Thursday N
2019-12-27 x Friday Y
2019-12-28 x Saturday N
2019-12-29 x Sunday N
2019-12-30 x Monday Y
2019-12-31 x Tuesday N
2020-01-01 x Wednesday N
2020-01-02 x Thursday Y
2020-01-03 x Friday Y
2020-01-04 x Saturday N
2020-01-05 x Sunday N
OrderNum InvoicedDate Vendor DueDate(3rd occurrence of DateFlag ='Y' from networkdays table)
8 2019-12-21 x 2019-12-30
I have not been able to find any example of how to accomplish this online.
Thank you in advance for any help with this.
Best Answer
-
Vertica_Curtis Employee
Ok, I see what's happening. I have a duedate column on my table. So I think the syntax is reverting to an order by on that column, and not on the alias of my nth_value. I think maybe your table doesn't have that column in it, so it's failing. But that's good, because it's actually not correct. If I change the name of my alias, I get the same error. This version should correct it:
select *
from ( SELECT a.OrderId, a.InvoicedTime, a.VendorId
, nth_value(b.Date, 3) over (partition by a.VendorId, orderID order by date) as asdfghjkl
FROM nth_test a
JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
WHERE b.DateFlag = 'Y') foo
limit 1 over(partition by VendorId, OrderId order by asdfghjkl ) ;5
Answers
nth_value should be able to do it.
Your query would need to filter DateFlag ='Y' , but something like this:
select nth_value(date, 3) over (partition by vendor order by date) from NetworkDays where DateFlag = 'Y' and vendor = ''x' ;
Hello Vertica_Curtis,
I really appreciate your quick response on this. I am not 100% how to make what you have provided work. Here is what I am doing:
select
a.*,
nth_value(b.Date,3) over (partition by b.Vendor order by b.Date) DueDate
from schema.test a, schema.networkdays b
where
b.DateFlag ='Y' and
a.VendorId = b.Vendor and
a.InvoicedTime::date = b.Date
Here is a sample of the output right now:
OrderNum InvoicedDate Vendor DueDate(3rd occurrence of DateFlag ='Y' from networkdays table)
1 2020-01-30 x [Null]
2 2020-01-30 x 2020-01-30
3 2020-01-31 x 2020-01-30
4 2020-02-03 x 2020-01-30
5 2020-02-04 x 2020-01-30
So I am getting a null value or only the same date over and over in the field.
Thank you again for your help
Can you post a few sample rows from your two tables. I can try to recreate it in my environment.
I attached the full Networkdays table and a sample output from the test table. Please let me know if you need anything else.
Thank you
Something like this should get it:
SELECT a.OrderId, a.InvoicedTime, a.VendorId
, nth_value(b.Date, 3) over (partition by a.VendorId, orderID order by date) DueDate
FROM nth_test a
JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
WHERE b.DateFlag = 'Y'
limit 1 over(partition by VendorId, OrderId order by duedate ) ;
Your query has = to on the dates. You need to do >= so you can get all the dates into the query, in order to find the 3rd one. As an analytic, nth_value is going to return all the dates, so I'm using a limit 1 here, since the nth_value has already found the one you're interested in.
Thank you Vertica_Curtis, it is looking really promising so far. I haven't tested all the dates that I know are problem ones yet, but the initial look is really good. Thank you so much for this.
So I did find a case where it doesn't return the correct date. An order that is invoiced on 12/21/19 for vendor j should have a due date of 12/30/19. Here is the data from the networkdays table:
12/21/2019 j Saturday N
12/22/2019 j Sunday N
12/23/2019 j Monday Y
12/24/2019 j Tuesday N
12/25/2019 j Wednesday N
12/26/2019 j Thursday N
12/27/2019 j Friday Y
12/28/2019 j Saturday N
12/29/2019 j Sunday N
12/30/2019 j Monday Y
12/31/2019 j Tuesday N
When I ran the query you gave, it output 12/27/19 for the due date. I had to modify your query slightly because it gave me an error stating that the column due date didn't exist in the limit 1 over portion so I replaced it with a.InvoicedTime:
SELECT a.OrderId, a.InvoicedTime, a.VendorId
, nth_value(b.Date, 5) over (partition by a.Vendorid, OrderId order by b.Date) DueDate
FROM nth_test a
JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
WHERE b.DateFlag = 'Y'
limit 1 over (partition by a.VendorId, ShipBinId order by a.InvoicedTime) ;
If I do this then it gives the expected output:
SELECT a.OrderId, a.InvoicedTime, a.VendorId
, nth_value(b.Date, 5) over (partition by a.Vendorid, OrderId order by b.Date) DueDate
FROM nth_test a
JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
WHERE b.DateFlag = 'Y'
limit 1 over (partition by a.VendorId, ShipBinId order by a.InvoicedTime) ;
Any clue why that happens?
Looks like you changed it to the 5th value, not the 3rd. And you've adjusted the partition of your analytic to include shipBinId, which seems a little weird to me. Those are definitely going to affect all rows in how it behaves. What error were you getting?
When I run this:
SELECT a.OrderId, a.InvoicedTime, a.VendorId
, nth_value(b.Date, 3) over (partition by a.VendorId, orderID order by date) DueDate
FROM nth_test a
JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
WHERE b.DateFlag = 'Y'
limit 1 over(partition by VendorId, OrderId order by duedate ) ;
SQL Error [2624] [42703]: [Vertica]VJDBC ERROR: Column "duedate" does not exist
So then when I change it to this:
SELECT a.OrderId, a.InvoicedTime, a.VendorId
, nth_value(b.Date, 3) over (partition by a.VendorId, orderID order by date) DueDate
FROM nth_test a
JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
WHERE b.DateFlag = 'Y'
limit 1 over(partition by VendorId, OrderId order by a.InvoicedTime) ;
I get:
Orderid InvoicedTime Vendorid DueDate
9 2019-12-21 06:34:43 j 2019-12-27
If I modify the query to:
SELECT a.OrderId, a.InvoicedTime, a.VendorId
, nth_value(b.Date, 5) over (partition by a.VendorId, orderID order by date) DueDate
FROM nth_test a
JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
WHERE b.DateFlag = 'Y'
limit 1 over(partition by VendorId, OrderId order by a.InvoicedTime) ;
I get:
Orderid InvoicedTime Vendorid DueDate
9 2019-12-21 06:34:43 j 2019-12-30
just like I am supposed to per the networkdays table:
12/21/2019 j Saturday N
12/22/2019 j Sunday N
12/23/2019 j Monday Y
12/24/2019 j Tuesday N
12/25/2019 j Wednesday N
12/26/2019 j Thursday N
12/27/2019 j Friday Y
12/28/2019 j Saturday N
12/29/2019 j Sunday N
12/30/2019 j Monday Y
12/31/2019 j Tuesday N
12/30 is the expected result but when I run nth_value(b.Date, 3), I do not get the correct result and I don't understand why.
Thank you again for your help.
That query did fix the problem of the column not existing so thank you for that, but the output remains the same:
select *
from ( SELECT a.OrderId, a.InvoicedTime, a.VendorId
, nth_value(b.Date, 3) over (partition by a.VendorId, orderID order by date) as asdfghjkl
FROM nth_test a
JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
WHERE b.DateFlag = 'Y') foo
limit 1 over(partition by VendorId, OrderId order by asdfghjkl ) ;
Orderid InvoicedTime Vendorid asdfghjkl
9 2019-12-21 06:34:43 j 2019-12-27
If I change it to 4:
select *
from ( SELECT a.OrderId, a.InvoicedTime, a.VendorId
, nth_value(b.Date, 4) over (partition by a.VendorId, orderID order by date) as asdfghjkl
FROM nth_test a
JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
WHERE b.DateFlag = 'Y') foo
limit 1 over(partition by VendorId, OrderId order by asdfghjkl ) ;
I get:
Orderid InvoicedTime Vendorid asdfghjkl
9 2019-12-21 06:34:43 j 2019-12-27
But only when I change it to 5 does it give me the correct result:
select *
from ( SELECT a.OrderId, a.InvoicedTime, a.VendorId
, nth_value(b.Date, 5) over (partition by a.VendorId, orderID order by date) as asdfghjkl
FROM nth_test a
JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
WHERE b.DateFlag = 'Y') foo
limit 1 over(partition by VendorId, OrderId order by asdfghjkl ) ;
Orderid InvoicedTime Vendorid asdfghjkl
9 2019-12-21 06:34:43 j 2019-12-30
When you run the query, what is your output?
Answered in a different thread.