How do I find the 3rd occurrence in a column and get the corresponding data?
I have a lookup table called networkdays, here is a sample of the data:
Date Vendor DayName DateFlag
20200127 x Monday N
20200128 x Tuesday Y
20200129 x Wednesday Y
20200130 x Thursday Y
20200131 x Friday Y
20200201 x Saturday N
20200202 x Sunday N
20200203 x Monday Y
20200204 x Tuesday Y
20200205 x Wednesday Y
20200206 x Thursday Y
20200207 x Friday Y
20200208 x Saturday N
20200209 x Sunday N
20200210 x Monday Y
20200211 x Tuesday Y
20200212 x Wednesday Y
20200213 x Thursday Y
20200214 x Friday Y
20200215 x Saturday N
20200216 x Sunday N
There are multiple vendors with different DateFlag markers. The DateFlag indicates a day that particular vendor is working.
I have another table that has data about orders that have been placed and the networkdays table is there to give me a due date for when the order needs to be completed. Typically the vendors are allowed 3 working days to produce the order, so essentially I need to count 3 occurrences of DateFlag = 'Y' (not including date received) and then input the corresponding Date into this other set of data.
So an example would be
OrderNum InvoicedDate Vendor DueDate(3rd occurrence of DateFlag ='Y' from networkdays table)
1 20200127 x 20200130
2 20200128 x 20200131
3 20200129 x 20200203
4 20200130 x 20200204
5 20200131 x 20200205
6 20200201 x 20200205
7 20200202 x 20200205
So being able to find the 3rd occurrence of 'Y' would help solve problems like this below:
Date Vendor DayName DateFlag
20191221 x Saturday N
20191222 x Sunday N
20191223 x Monday Y
20191224 x Tuesday N
20191225 x Wednesday N
20191226 x Thursday N
20191227 x Friday Y
20191228 x Saturday N
20191229 x Sunday N
20191230 x Monday Y
20191231 x Tuesday N
20200101 x Wednesday N
20200102 x Thursday Y
20200103 x Friday Y
20200104 x Saturday N
20200105 x Sunday N
OrderNum InvoicedDate Vendor DueDate(3rd occurrence of DateFlag ='Y' from networkdays table)
8 20191221 x 20191230
I have not been able to find any example of how to accomplish this online.
Thank you in advance for any help with this.
Best Answer

OptionsVertica_Curtis Employee
Ok, I see what's happening. I have a duedate column on my table. So I think the syntax is reverting to an order by on that column, and not on the alias of my nth_value. I think maybe your table doesn't have that column in it, so it's failing. But that's good, because it's actually not correct. If I change the name of my alias, I get the same error. This version should correct it:
select *
from ( SELECT a.OrderId, a.InvoicedTime, a.VendorId
, nth_value(b.Date, 3) over (partition by a.VendorId, orderID order by date) as asdfghjkl
FROM nth_test a
JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
WHERE b.DateFlag = 'Y') foo
limit 1 over(partition by VendorId, OrderId order by asdfghjkl ) ;5
Answers
nth_value should be able to do it.
Your query would need to filter DateFlag ='Y' , but something like this:
select nth_value(date, 3) over (partition by vendor order by date) from NetworkDays where DateFlag = 'Y' and vendor = ''x' ;
Hello Vertica_Curtis,
I really appreciate your quick response on this. I am not 100% how to make what you have provided work. Here is what I am doing:
select
a.*,
nth_value(b.Date,3) over (partition by b.Vendor order by b.Date) DueDate
from schema.test a, schema.networkdays b
where
b.DateFlag ='Y' and
a.VendorId = b.Vendor and
a.InvoicedTime::date = b.Date
Here is a sample of the output right now:
OrderNum InvoicedDate Vendor DueDate(3rd occurrence of DateFlag ='Y' from networkdays table)
1 20200130 x [Null]
2 20200130 x 20200130
3 20200131 x 20200130
4 20200203 x 20200130
5 20200204 x 20200130
So I am getting a null value or only the same date over and over in the field.
Thank you again for your help
Can you post a few sample rows from your two tables. I can try to recreate it in my environment.
I attached the full Networkdays table and a sample output from the test table. Please let me know if you need anything else.
Thank you
Something like this should get it:
SELECT a.OrderId, a.InvoicedTime, a.VendorId
, nth_value(b.Date, 3) over (partition by a.VendorId, orderID order by date) DueDate
FROM nth_test a
JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
WHERE b.DateFlag = 'Y'
limit 1 over(partition by VendorId, OrderId order by duedate ) ;
Your query has = to on the dates. You need to do >= so you can get all the dates into the query, in order to find the 3rd one. As an analytic, nth_value is going to return all the dates, so I'm using a limit 1 here, since the nth_value has already found the one you're interested in.
Thank you Vertica_Curtis, it is looking really promising so far. I haven't tested all the dates that I know are problem ones yet, but the initial look is really good. Thank you so much for this.
So I did find a case where it doesn't return the correct date. An order that is invoiced on 12/21/19 for vendor j should have a due date of 12/30/19. Here is the data from the networkdays table:
12/21/2019 j Saturday N
12/22/2019 j Sunday N
12/23/2019 j Monday Y
12/24/2019 j Tuesday N
12/25/2019 j Wednesday N
12/26/2019 j Thursday N
12/27/2019 j Friday Y
12/28/2019 j Saturday N
12/29/2019 j Sunday N
12/30/2019 j Monday Y
12/31/2019 j Tuesday N
When I ran the query you gave, it output 12/27/19 for the due date. I had to modify your query slightly because it gave me an error stating that the column due date didn't exist in the limit 1 over portion so I replaced it with a.InvoicedTime:
SELECT a.OrderId, a.InvoicedTime, a.VendorId
, nth_value(b.Date, 5) over (partition by a.Vendorid, OrderId order by b.Date) DueDate
FROM nth_test a
JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
WHERE b.DateFlag = 'Y'
limit 1 over (partition by a.VendorId, ShipBinId order by a.InvoicedTime) ;
If I do this then it gives the expected output:
SELECT a.OrderId, a.InvoicedTime, a.VendorId
, nth_value(b.Date, 5) over (partition by a.Vendorid, OrderId order by b.Date) DueDate
FROM nth_test a
JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
WHERE b.DateFlag = 'Y'
limit 1 over (partition by a.VendorId, ShipBinId order by a.InvoicedTime) ;
Any clue why that happens?
Looks like you changed it to the 5th value, not the 3rd. And you've adjusted the partition of your analytic to include shipBinId, which seems a little weird to me. Those are definitely going to affect all rows in how it behaves. What error were you getting?
When I run this:
SELECT a.OrderId, a.InvoicedTime, a.VendorId
, nth_value(b.Date, 3) over (partition by a.VendorId, orderID order by date) DueDate
FROM nth_test a
JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
WHERE b.DateFlag = 'Y'
limit 1 over(partition by VendorId, OrderId order by duedate ) ;
SQL Error [2624] [42703]: [Vertica]VJDBC ERROR: Column "duedate" does not exist
So then when I change it to this:
SELECT a.OrderId, a.InvoicedTime, a.VendorId
, nth_value(b.Date, 3) over (partition by a.VendorId, orderID order by date) DueDate
FROM nth_test a
JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
WHERE b.DateFlag = 'Y'
limit 1 over(partition by VendorId, OrderId order by a.InvoicedTime) ;
I get:
Orderid InvoicedTime Vendorid DueDate
9 20191221 06:34:43 j 20191227
If I modify the query to:
SELECT a.OrderId, a.InvoicedTime, a.VendorId
, nth_value(b.Date, 5) over (partition by a.VendorId, orderID order by date) DueDate
FROM nth_test a
JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
WHERE b.DateFlag = 'Y'
limit 1 over(partition by VendorId, OrderId order by a.InvoicedTime) ;
I get:
Orderid InvoicedTime Vendorid DueDate
9 20191221 06:34:43 j 20191230
just like I am supposed to per the networkdays table:
12/21/2019 j Saturday N
12/22/2019 j Sunday N
12/23/2019 j Monday Y
12/24/2019 j Tuesday N
12/25/2019 j Wednesday N
12/26/2019 j Thursday N
12/27/2019 j Friday Y
12/28/2019 j Saturday N
12/29/2019 j Sunday N
12/30/2019 j Monday Y
12/31/2019 j Tuesday N
12/30 is the expected result but when I run nth_value(b.Date, 3), I do not get the correct result and I don't understand why.
Thank you again for your help.
That query did fix the problem of the column not existing so thank you for that, but the output remains the same:
select *
from ( SELECT a.OrderId, a.InvoicedTime, a.VendorId
, nth_value(b.Date, 3) over (partition by a.VendorId, orderID order by date) as asdfghjkl
FROM nth_test a
JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
WHERE b.DateFlag = 'Y') foo
limit 1 over(partition by VendorId, OrderId order by asdfghjkl ) ;
Orderid InvoicedTime Vendorid asdfghjkl
9 20191221 06:34:43 j 20191227
If I change it to 4:
select *
from ( SELECT a.OrderId, a.InvoicedTime, a.VendorId
, nth_value(b.Date, 4) over (partition by a.VendorId, orderID order by date) as asdfghjkl
FROM nth_test a
JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
WHERE b.DateFlag = 'Y') foo
limit 1 over(partition by VendorId, OrderId order by asdfghjkl ) ;
I get:
Orderid InvoicedTime Vendorid asdfghjkl
9 20191221 06:34:43 j 20191227
But only when I change it to 5 does it give me the correct result:
select *
from ( SELECT a.OrderId, a.InvoicedTime, a.VendorId
, nth_value(b.Date, 5) over (partition by a.VendorId, orderID order by date) as asdfghjkl
FROM nth_test a
JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
WHERE b.DateFlag = 'Y') foo
limit 1 over(partition by VendorId, OrderId order by asdfghjkl ) ;
Orderid InvoicedTime Vendorid asdfghjkl
9 20191221 06:34:43 j 20191230
When you run the query, what is your output?
Answered in a different thread.