how to get the first (or any single) value in GROUP BY without Postgres's ARRAY_AGG-like funtionalit

Hi Folks. I'm migrating some SQL from PostgreSQL 9.2 to Vertica 7.0, and I could use some help replacing postgres's cool [array_agg][1] feature with something that [Vertica][2] (and possibly other RDBMS) supports, such as partitions and window functions. I'm new to these features, and I'd really appreciate your ideas.

The (working) postgres query using array_agg ( http://sqlfiddle.com/#!12/a92d5/3/0 ):

    SELECT B.id, (array_agg(A.X))[1]
    FROM B, AB, A
    WHERE B.id = AB.B_id AND A.id = AB.A_id AND A.X IS NOT NULL
    GROUP BY B.id;

If I try to naively select A.X by itself without the aggregation (i.e., to let the RDBMS pick - actually works with MySQL and SQLite), Vertica and Postgres complain. Running the same query in Vertica but with "A.X" instead of "(array_agg(A.X))[1]":

    ERROR 2640:  Column "A.X" must appear in the GROUP BY clause or be used in an aggregate function

I was thinking of trying a window function, e.g., something like from [this question][3]:

    SELECT email, FIRST_VALUE(email) OVER (PARTITION BY email)
    FROM questions
    GROUP BY email;

but I get the same error:

    SELECT B.id, FIRST_VALUE(A.X) OVER (PARTITION BY A.id)
    FROM B, AB, A
    WHERE B.id = AB.B_id AND A.id = AB.A_id AND A.X IS NOT NULL
    GROUP BY B.id;

    ERROR 2640:  Column "A.X" must appear in the GROUP BY clause or be used in an aggregate function

Note that we don't care so much about getting the first value, we just need any (ideally deterministic) single value. Any ideas would be very welcome!


  [1]: http://www.postgresql.org/docs/9.2/static/functions-aggregate.html
  [2]: https://my.vertica.com/docs/7.0.x/HTML/index.htm#Authoring/SQLReferenceManual/SQLReferenceManual.htm...
  [3]: http://stackoverflow.com/questions/18686906/postgres-windows-function-with-aggregate-group-by/186928...





Comments

  • Hi Matthew,

    You're actually hinting at the underlying issue at the end of your post -- Vertica is refusing to give a non-deterministic answer.  You have to tell it how to pick which record to show.

    The error-message text actually hints at how to do this -- use an aggregate function.  For example, a simple solution would be:

    SELECT B.id, MIN(A.X) ...

    Note that Vertica, unlike PostgreSQL, stores its data sorted.  If you have a projection on A that is sorted on (A.id, A.X), then finding the min X for a given id is trivial, so very fast -- we just take the first one.  The Database Designer can set up this projection for you if you need it.

    Adam


  • That's super helpful, Adam. We realized MIN() would work fine. Much obliged!

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file