We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


How do I find the 3rd occurrence in a column and get the corresponding data? — Vertica Forum

How do I find the 3rd occurrence in a column and get the corresponding data?

Elemist11Elemist11 Community Edition User

I have a lookup table called networkdays, here is a sample of the data:
Date Vendor DayName DateFlag
2020-01-27 x Monday N
2020-01-28 x Tuesday Y
2020-01-29 x Wednesday Y
2020-01-30 x Thursday Y
2020-01-31 x Friday Y
2020-02-01 x Saturday N
2020-02-02 x Sunday N
2020-02-03 x Monday Y
2020-02-04 x Tuesday Y
2020-02-05 x Wednesday Y
2020-02-06 x Thursday Y
2020-02-07 x Friday Y
2020-02-08 x Saturday N
2020-02-09 x Sunday N
2020-02-10 x Monday Y
2020-02-11 x Tuesday Y
2020-02-12 x Wednesday Y
2020-02-13 x Thursday Y
2020-02-14 x Friday Y
2020-02-15 x Saturday N
2020-02-16 x Sunday N
There are multiple vendors with different DateFlag markers. The DateFlag indicates a day that particular vendor is working.
I have another table that has data about orders that have been placed and the networkdays table is there to give me a due date for when the order needs to be completed. Typically the vendors are allowed 3 working days to produce the order, so essentially I need to count 3 occurrences of DateFlag = 'Y' (not including date received) and then input the corresponding Date into this other set of data.
So an example would be
OrderNum InvoicedDate Vendor DueDate(3rd occurrence of DateFlag ='Y' from networkdays table)
1 2020-01-27 x 2020-01-30
2 2020-01-28 x 2020-01-31
3 2020-01-29 x 2020-02-03
4 2020-01-30 x 2020-02-04
5 2020-01-31 x 2020-02-05
6 2020-02-01 x 2020-02-05
7 2020-02-02 x 2020-02-05
So being able to find the 3rd occurrence of 'Y' would help solve problems like this below:
Date Vendor DayName DateFlag
2019-12-21 x Saturday N
2019-12-22 x Sunday N
2019-12-23 x Monday Y
2019-12-24 x Tuesday N
2019-12-25 x Wednesday N
2019-12-26 x Thursday N
2019-12-27 x Friday Y
2019-12-28 x Saturday N
2019-12-29 x Sunday N
2019-12-30 x Monday Y
2019-12-31 x Tuesday N
2020-01-01 x Wednesday N
2020-01-02 x Thursday Y
2020-01-03 x Friday Y
2020-01-04 x Saturday N
2020-01-05 x Sunday N

OrderNum InvoicedDate Vendor DueDate(3rd occurrence of DateFlag ='Y' from networkdays table)
8 2019-12-21 x 2019-12-30
I have not been able to find any example of how to accomplish this online.
Thank you in advance for any help with this.

Best Answer

Answers

  • nth_value should be able to do it.

    Your query would need to filter DateFlag ='Y' , but something like this:
    select nth_value(date, 3) over (partition by vendor order by date) from NetworkDays where DateFlag = 'Y' and vendor = ''x' ;

  • Elemist11Elemist11 Community Edition User

    Hello Vertica_Curtis,
    I really appreciate your quick response on this. I am not 100% how to make what you have provided work. Here is what I am doing:
    select
    a.*,
    nth_value(b.Date,3) over (partition by b.Vendor order by b.Date) DueDate
    from schema.test a, schema.networkdays b
    where
    b.DateFlag ='Y' and
    a.VendorId = b.Vendor and
    a.InvoicedTime::date = b.Date
    Here is a sample of the output right now:
    OrderNum InvoicedDate Vendor DueDate(3rd occurrence of DateFlag ='Y' from networkdays table)
    1 2020-01-30 x [Null]
    2 2020-01-30 x 2020-01-30
    3 2020-01-31 x 2020-01-30
    4 2020-02-03 x 2020-01-30
    5 2020-02-04 x 2020-01-30
    So I am getting a null value or only the same date over and over in the field.
    Thank you again for your help

  • Can you post a few sample rows from your two tables. I can try to recreate it in my environment.

  • Elemist11Elemist11 Community Edition User

    I attached the full Networkdays table and a sample output from the test table. Please let me know if you need anything else.
    Thank you

  • Something like this should get it:

    SELECT a.OrderId, a.InvoicedTime, a.VendorId
    , nth_value(b.Date, 3) over (partition by a.VendorId, orderID order by date) DueDate
    FROM nth_test a
    JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
    WHERE b.DateFlag = 'Y'
    limit 1 over(partition by VendorId, OrderId order by duedate ) ;

    Your query has = to on the dates. You need to do >= so you can get all the dates into the query, in order to find the 3rd one. As an analytic, nth_value is going to return all the dates, so I'm using a limit 1 here, since the nth_value has already found the one you're interested in.

  • Elemist11Elemist11 Community Edition User

    Thank you Vertica_Curtis, it is looking really promising so far. I haven't tested all the dates that I know are problem ones yet, but the initial look is really good. Thank you so much for this.

  • Elemist11Elemist11 Community Edition User

    So I did find a case where it doesn't return the correct date. An order that is invoiced on 12/21/19 for vendor j should have a due date of 12/30/19. Here is the data from the networkdays table:
    12/21/2019 j Saturday N
    12/22/2019 j Sunday N
    12/23/2019 j Monday Y
    12/24/2019 j Tuesday N
    12/25/2019 j Wednesday N
    12/26/2019 j Thursday N
    12/27/2019 j Friday Y
    12/28/2019 j Saturday N
    12/29/2019 j Sunday N
    12/30/2019 j Monday Y
    12/31/2019 j Tuesday N
    When I ran the query you gave, it output 12/27/19 for the due date. I had to modify your query slightly because it gave me an error stating that the column due date didn't exist in the limit 1 over portion so I replaced it with a.InvoicedTime:
    SELECT a.OrderId, a.InvoicedTime, a.VendorId
    , nth_value(b.Date, 5) over (partition by a.Vendorid, OrderId order by b.Date) DueDate
    FROM nth_test a
    JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
    WHERE b.DateFlag = 'Y'
    limit 1 over (partition by a.VendorId, ShipBinId order by a.InvoicedTime) ;
    If I do this then it gives the expected output:
    SELECT a.OrderId, a.InvoicedTime, a.VendorId
    , nth_value(b.Date, 5) over (partition by a.Vendorid, OrderId order by b.Date) DueDate
    FROM nth_test a
    JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
    WHERE b.DateFlag = 'Y'
    limit 1 over (partition by a.VendorId, ShipBinId order by a.InvoicedTime) ;
    Any clue why that happens?

  • Looks like you changed it to the 5th value, not the 3rd. And you've adjusted the partition of your analytic to include shipBinId, which seems a little weird to me. Those are definitely going to affect all rows in how it behaves. What error were you getting?

  • Elemist11Elemist11 Community Edition User

    When I run this:
    SELECT a.OrderId, a.InvoicedTime, a.VendorId
    , nth_value(b.Date, 3) over (partition by a.VendorId, orderID order by date) DueDate
    FROM nth_test a
    JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
    WHERE b.DateFlag = 'Y'
    limit 1 over(partition by VendorId, OrderId order by duedate ) ;

    SQL Error [2624] [42703]: [Vertica]VJDBC ERROR: Column "duedate" does not exist

    So then when I change it to this:
    SELECT a.OrderId, a.InvoicedTime, a.VendorId
    , nth_value(b.Date, 3) over (partition by a.VendorId, orderID order by date) DueDate
    FROM nth_test a
    JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
    WHERE b.DateFlag = 'Y'
    limit 1 over(partition by VendorId, OrderId order by a.InvoicedTime) ;

    I get:
    Orderid InvoicedTime Vendorid DueDate
    9 2019-12-21 06:34:43 j 2019-12-27

    If I modify the query to:
    SELECT a.OrderId, a.InvoicedTime, a.VendorId
    , nth_value(b.Date, 5) over (partition by a.VendorId, orderID order by date) DueDate
    FROM nth_test a
    JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
    WHERE b.DateFlag = 'Y'
    limit 1 over(partition by VendorId, OrderId order by a.InvoicedTime) ;

    I get:
    Orderid InvoicedTime Vendorid DueDate
    9 2019-12-21 06:34:43 j 2019-12-30
    just like I am supposed to per the networkdays table:
    12/21/2019 j Saturday N
    12/22/2019 j Sunday N
    12/23/2019 j Monday Y
    12/24/2019 j Tuesday N
    12/25/2019 j Wednesday N
    12/26/2019 j Thursday N
    12/27/2019 j Friday Y
    12/28/2019 j Saturday N
    12/29/2019 j Sunday N
    12/30/2019 j Monday Y
    12/31/2019 j Tuesday N

    12/30 is the expected result but when I run nth_value(b.Date, 3), I do not get the correct result and I don't understand why.
    Thank you again for your help.

  • Elemist11Elemist11 Community Edition User

    That query did fix the problem of the column not existing so thank you for that, but the output remains the same:
    select *
    from ( SELECT a.OrderId, a.InvoicedTime, a.VendorId
    , nth_value(b.Date, 3) over (partition by a.VendorId, orderID order by date) as asdfghjkl
    FROM nth_test a
    JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
    WHERE b.DateFlag = 'Y') foo
    limit 1 over(partition by VendorId, OrderId order by asdfghjkl ) ;
    Orderid InvoicedTime Vendorid asdfghjkl
    9 2019-12-21 06:34:43 j 2019-12-27

    If I change it to 4:
    select *
    from ( SELECT a.OrderId, a.InvoicedTime, a.VendorId
    , nth_value(b.Date, 4) over (partition by a.VendorId, orderID order by date) as asdfghjkl
    FROM nth_test a
    JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
    WHERE b.DateFlag = 'Y') foo
    limit 1 over(partition by VendorId, OrderId order by asdfghjkl ) ;

    I get:
    Orderid InvoicedTime Vendorid asdfghjkl
    9 2019-12-21 06:34:43 j 2019-12-27

    But only when I change it to 5 does it give me the correct result:
    select *
    from ( SELECT a.OrderId, a.InvoicedTime, a.VendorId
    , nth_value(b.Date, 5) over (partition by a.VendorId, orderID order by date) as asdfghjkl
    FROM nth_test a
    JOIN networkdays b ON b.Date >= a.InvoicedTime and b.Vendor = a.VendorId
    WHERE b.DateFlag = 'Y') foo
    limit 1 over(partition by VendorId, OrderId order by asdfghjkl ) ;

    Orderid InvoicedTime Vendorid asdfghjkl
    9 2019-12-21 06:34:43 j 2019-12-30

    When you run the query, what is your output?

  • Elemist11Elemist11 Community Edition User

    Answered in a different thread.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file