The Vertica Forum recently got a makeover! Let us know what you think by filling out this short, anonymous survey.

How do I find the 3rd occurrence in a column and get the corresponding data?

I have a lookup table called networkdays, here is a sample of the data:
Date Vendor DayName DateFlag
2020-01-27 x Monday N
2020-01-28 x Tuesday Y
2020-01-29 x Wednesday Y
2020-01-30 x Thursday Y
2020-01-31 x Friday Y
2020-02-01 x Saturday N
2020-02-02 x Sunday N
2020-02-03 x Monday Y
2020-02-04 x Tuesday Y
2020-02-05 x Wednesday Y
2020-02-06 x Thursday Y
2020-02-07 x Friday Y
2020-02-08 x Saturday N
2020-02-09 x Sunday N
2020-02-10 x Monday Y
2020-02-11 x Tuesday Y
2020-02-12 x Wednesday Y
2020-02-13 x Thursday Y
2020-02-14 x Friday Y
2020-02-15 x Saturday N
2020-02-16 x Sunday N
There are multiple vendors with different DateFlag markers. The DateFlag indicates a day that particular vendor is working.
I have another table that has data about orders that have been placed and the networkdays table is there to give me a due date for when the order needs to be completed. Typically the vendors are allowed 3 working days to produce the order, so essentially I need to count 3 occurrences of DateFlag = 'Y' (not including date received) and then input the corresponding Date into this other set of data.
So an example would be
OrderNum InvoicedDate Vendor DueDate(3rd occurrence of DateFlag ='Y' from networkdays table)
1 2020-01-27 x 2020-01-30
2 2020-01-28 x 2020-01-31
3 2020-01-29 x 2020-02-03
4 2020-01-30 x 2020-02-04
5 2020-01-31 x 2020-02-05
6 2020-02-01 x 2020-02-05
7 2020-02-02 x 2020-02-05
So being able to find the 3rd occurrence of 'Y' would help solve problems like this below:
Date Vendor DayName DateFlag
2019-12-21 x Saturday N
2019-12-22 x Sunday N
2019-12-23 x Monday Y
2019-12-24 x Tuesday N
2019-12-25 x Wednesday N
2019-12-26 x Thursday N
2019-12-27 x Friday Y
2019-12-28 x Saturday N
2019-12-29 x Sunday N
2019-12-30 x Monday Y
2019-12-31 x Tuesday N
2020-01-01 x Wednesday N
2020-01-02 x Thursday Y
2020-01-03 x Friday Y
2020-01-04 x Saturday N
2020-01-05 x Sunday N

OrderNum InvoicedDate Vendor DueDate(3rd occurrence of DateFlag ='Y' from networkdays table)
8 2019-12-21 x 2019-12-30
I have not been able to find any example of how to accomplish this online.
Thank you in advance for any help with this.

Best Answer

Answers

  • nth_value should be able to do it.

    Your query would need to filter DateFlag ='Y' , but something like this:
    select nth_value(date, 3) over (partition by vendor order by date) from NetworkDays where DateFlag = 'Y' and vendor = ''x' ;

  • Hello Vertica_Curtis,
    I really appreciate your quick response on this. I am not 100% how to make what you have provided work. Here is what I am doing:
    select
    a.*,
    nth_value(b.Date,3) over (partition by b.Vendor order by b.Date) DueDate
    from schema.test a, schema.networkdays b
    where
    b.DateFlag ='Y' and
    a.VendorId = b.Vendor and
    a.InvoicedTime::date = b.Date
    Here is a sample of the output right now:
    OrderNum InvoicedDate Vendor DueDate(3rd occurrence of DateFlag ='Y' from networkdays table)
    1 2020-01-30 x [Null]
    2 2020-01-30 x 2020-01-30
    3 2020-01-31 x 2020-01-30
    4 2020-02-03 x 2020-01-30
    5 2020-02-04 x 2020-01-30
    So I am getting a null value or only the same date over and over in the field.
    Thank you again for your help

  • Can you post a few sample rows from your two tables. I can try to recreate it in my environment.

  • I attached the full Networkdays table and a sample output from the test table. Please let me know if you need anything else.
    Thank you

  • Something like this should get it:

    SELECT a.OrderId, a.InvoicedTime, a.VendorId
    , nth_value(b.Date, 3) over (partition by a.VendorId, orderID order by date) DueDate
    FROM nth_test a
    JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
    WHERE b.DateFlag = 'Y'
    limit 1 over(partition by VendorId, OrderId order by duedate ) ;

    Your query has = to on the dates. You need to do >= so you can get all the dates into the query, in order to find the 3rd one. As an analytic, nth_value is going to return all the dates, so I'm using a limit 1 here, since the nth_value has already found the one you're interested in.

  • Thank you Vertica_Curtis, it is looking really promising so far. I haven't tested all the dates that I know are problem ones yet, but the initial look is really good. Thank you so much for this.

  • So I did find a case where it doesn't return the correct date. An order that is invoiced on 12/21/19 for vendor j should have a due date of 12/30/19. Here is the data from the networkdays table:
    12/21/2019 j Saturday N
    12/22/2019 j Sunday N
    12/23/2019 j Monday Y
    12/24/2019 j Tuesday N
    12/25/2019 j Wednesday N
    12/26/2019 j Thursday N
    12/27/2019 j Friday Y
    12/28/2019 j Saturday N
    12/29/2019 j Sunday N
    12/30/2019 j Monday Y
    12/31/2019 j Tuesday N
    When I ran the query you gave, it output 12/27/19 for the due date. I had to modify your query slightly because it gave me an error stating that the column due date didn't exist in the limit 1 over portion so I replaced it with a.InvoicedTime:
    SELECT a.OrderId, a.InvoicedTime, a.VendorId
    , nth_value(b.Date, 5) over (partition by a.Vendorid, OrderId order by b.Date) DueDate
    FROM nth_test a
    JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
    WHERE b.DateFlag = 'Y'
    limit 1 over (partition by a.VendorId, ShipBinId order by a.InvoicedTime) ;
    If I do this then it gives the expected output:
    SELECT a.OrderId, a.InvoicedTime, a.VendorId
    , nth_value(b.Date, 5) over (partition by a.Vendorid, OrderId order by b.Date) DueDate
    FROM nth_test a
    JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
    WHERE b.DateFlag = 'Y'
    limit 1 over (partition by a.VendorId, ShipBinId order by a.InvoicedTime) ;
    Any clue why that happens?

  • Looks like you changed it to the 5th value, not the 3rd. And you've adjusted the partition of your analytic to include shipBinId, which seems a little weird to me. Those are definitely going to affect all rows in how it behaves. What error were you getting?

  • When I run this:
    SELECT a.OrderId, a.InvoicedTime, a.VendorId
    , nth_value(b.Date, 3) over (partition by a.VendorId, orderID order by date) DueDate
    FROM nth_test a
    JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
    WHERE b.DateFlag = 'Y'
    limit 1 over(partition by VendorId, OrderId order by duedate ) ;

    SQL Error [2624] [42703]: [Vertica]VJDBC ERROR: Column "duedate" does not exist

    So then when I change it to this:
    SELECT a.OrderId, a.InvoicedTime, a.VendorId
    , nth_value(b.Date, 3) over (partition by a.VendorId, orderID order by date) DueDate
    FROM nth_test a
    JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
    WHERE b.DateFlag = 'Y'
    limit 1 over(partition by VendorId, OrderId order by a.InvoicedTime) ;

    I get:
    Orderid InvoicedTime Vendorid DueDate
    9 2019-12-21 06:34:43 j 2019-12-27

    If I modify the query to:
    SELECT a.OrderId, a.InvoicedTime, a.VendorId
    , nth_value(b.Date, 5) over (partition by a.VendorId, orderID order by date) DueDate
    FROM nth_test a
    JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
    WHERE b.DateFlag = 'Y'
    limit 1 over(partition by VendorId, OrderId order by a.InvoicedTime) ;

    I get:
    Orderid InvoicedTime Vendorid DueDate
    9 2019-12-21 06:34:43 j 2019-12-30
    just like I am supposed to per the networkdays table:
    12/21/2019 j Saturday N
    12/22/2019 j Sunday N
    12/23/2019 j Monday Y
    12/24/2019 j Tuesday N
    12/25/2019 j Wednesday N
    12/26/2019 j Thursday N
    12/27/2019 j Friday Y
    12/28/2019 j Saturday N
    12/29/2019 j Sunday N
    12/30/2019 j Monday Y
    12/31/2019 j Tuesday N

    12/30 is the expected result but when I run nth_value(b.Date, 3), I do not get the correct result and I don't understand why.
    Thank you again for your help.

  • That query did fix the problem of the column not existing so thank you for that, but the output remains the same:
    select *
    from ( SELECT a.OrderId, a.InvoicedTime, a.VendorId
    , nth_value(b.Date, 3) over (partition by a.VendorId, orderID order by date) as asdfghjkl
    FROM nth_test a
    JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
    WHERE b.DateFlag = 'Y') foo
    limit 1 over(partition by VendorId, OrderId order by asdfghjkl ) ;
    Orderid InvoicedTime Vendorid asdfghjkl
    9 2019-12-21 06:34:43 j 2019-12-27

    If I change it to 4:
    select *
    from ( SELECT a.OrderId, a.InvoicedTime, a.VendorId
    , nth_value(b.Date, 4) over (partition by a.VendorId, orderID order by date) as asdfghjkl
    FROM nth_test a
    JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
    WHERE b.DateFlag = 'Y') foo
    limit 1 over(partition by VendorId, OrderId order by asdfghjkl ) ;

    I get:
    Orderid InvoicedTime Vendorid asdfghjkl
    9 2019-12-21 06:34:43 j 2019-12-27

    But only when I change it to 5 does it give me the correct result:
    select *
    from ( SELECT a.OrderId, a.InvoicedTime, a.VendorId
    , nth_value(b.Date, 5) over (partition by a.VendorId, orderID order by date) as asdfghjkl
    FROM nth_test a
    JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
    WHERE b.DateFlag = 'Y') foo
    limit 1 over(partition by VendorId, OrderId order by asdfghjkl ) ;

    Orderid InvoicedTime Vendorid asdfghjkl
    9 2019-12-21 06:34:43 j 2019-12-30

    When you run the query, what is your output?

  • Answered in a different thread.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.