argmax aggregate function

amosmossamosmoss Registered User

Hi all,

I am wondering if there is an implementation for the argmax aggregate function in vertica.
ARGMAX(col_a, col_c) returns col_c's value, for which col_a has the largest value.

This can also be used within analytic windows and ranges.

Thanks,
Amos.

Comments

  • emorenoemoreno Employee, Registered User, VerticaExpert

    Hi, No vertica does not have that function but we do have the ability to write your own functions, this seems a good candidate for a User Define Transform Function (UDTFs)

    You can write UDTFs in Java, C++, Python, this is the link to the doc for details.
    UDTFS Doc

    There is also other ways to implement just with SQL statements such as:

    select col_c from table A where col_a = ( select col_a from table A order by col_a order by col_a desc limit 1) a;

    But this may make your sql complex.

    Hope this helps,

    Eugenia

  • amosmossamosmoss Registered User

    Thanks Eugenia :)

    Is there a way to create a feature request for this? I'm assuming UDTFS are not as fast as native functions. Am I right?

    Best,
    Amos.

  • emorenoemoreno Employee, Registered User, VerticaExpert

    I will open a feature request.

    UDFTS can be fast if you design it and developed right. If you do in C++ and you are sure that it works at it should be you can do it in Un fenced mode so the main Vertica process run it as it will be as fast as native ( if the code is done properly)

    Eugenia

  • amosmossamosmoss Registered User
    edited April 2017

    Hi,

    The thing is that this kind of aggregate function needs to materialize the second argument, only after we find the max of the first one, and not for every row. So I don't think is would be as efficient as possible.

    In addition, from what I read in the docs, only one UDTF is allowed in the select clause, without anything else, which misses the whole point of issuing a a query such:

    -- Getting the highest earning employee in each department
    select argmax(salery, employee_name), department
    from employees
    group by department;

    or even:
    -- For each employee, get the salery diff to the department's senior
    select employee_name, argmax(age, salery) over (partition by department) - salery as salery_diff_to_senior
    from employees;

    Is there a way keep track on this feature request?

    Thanks,
    Amos.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file