argmax aggregate function
Hi all,
I am wondering if there is an implementation for the argmax aggregate function in vertica.
ARGMAX(col_a, col_c) returns col_c's value, for which col_a has the largest value.
This can also be used within analytic windows and ranges.
Thanks,
Amos.
0
Comments
Hi, No vertica does not have that function but we do have the ability to write your own functions, this seems a good candidate for a User Define Transform Function (UDTFs)
You can write UDTFs in Java, C++, Python, this is the link to the doc for details.
UDTFS Doc
There is also other ways to implement just with SQL statements such as:
select col_c from table A where col_a = ( select col_a from table A order by col_a order by col_a desc limit 1) a;
But this may make your sql complex.
Hope this helps,
Eugenia
Thanks Eugenia
Is there a way to create a feature request for this? I'm assuming UDTFS are not as fast as native functions. Am I right?
Best,
Amos.
I will open a feature request.
UDFTS can be fast if you design it and developed right. If you do in C++ and you are sure that it works at it should be you can do it in Un fenced mode so the main Vertica process run it as it will be as fast as native ( if the code is done properly)
Eugenia
Hi,
The thing is that this kind of aggregate function needs to materialize the second argument, only after we find the max of the first one, and not for every row. So I don't think is would be as efficient as possible.
In addition, from what I read in the docs, only one UDTF is allowed in the select clause, without anything else, which misses the whole point of issuing a a query such:
-- Getting the highest earning employee in each department
select argmax(salery, employee_name), department
from employees
group by department;
or even:
-- For each employee, get the salery diff to the department's senior
select employee_name, argmax(age, salery) over (partition by department) - salery as salery_diff_to_senior
from employees;
Is there a way keep track on this feature request?
Thanks,
Amos.