Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!

How to correctly use UTYPE hint for merging

Hi,

I have two identically sorted (much bigger than huge) datasets, and need to combine them, keeping sort order.

Vertica do have a perfect facility to accomplish it - UNION ALL with SORT clause and corresponding hints

(SELECT * FROM t1 ORDER BY x)
UNION ALL
(SELECT * FROM t2 ORDER BY x)
SORT BY X;

When executed correctly, data is merged. Straight sorting of resultset would be around two orders of magnitude less performant.

Adding required hints:

create table t(c varchar(10), n varchar(10))
order by c
segmented by hash(c) all nodes;

insert into t values('A', 'B');
insert into t values('C', 'D');

(select /+ syntactic_join */ * from t order by c)
union all /
+ UTYPE(M) */
(select * from t order by c)
order by c;

dbadmin=> (select /+ syntactic_join */ * from t order by c)
dbadmin-> union all /
+ UTYPE(M) */
dbadmin-> (select * from t order by c)
dbadmin-> order by c;
WARNING 7134: Union type hints are not feasible and will be ignored
c | n
---+---
A | B
A | B
C | D
C | D
(4 rows)

Apparently, union type hint does not work. I remember it used to work just fine.
What I am doing wrong? Both rowsets are sorted on correct key, result sorted on correct key, hints are correct. Still, UNION ALL is doing concatenate instead of merging.
That is confirmed by explain:

explain
(select /+ syntactic_join */ * from t order by c)
union all /
+ UTYPE(M) */
(select * from t order by c)
order by c;

Access Path:
+-SORT [Cost: 3K, Rows: 20K (NO STATISTICS)]
| Order: "SELECT 1".c ASC
| +---> UNION ALL [Cost: 2K, Rows: 20K (NO STATISTICS)]
| | +---> STORAGE ACCESS for t [Cost: 1K, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
| | | Projection: public.t_super
| | | Materialize: t.c, t.n
| | +---> STORAGE ACCESS for t [Cost: 1K, Rows: 10K (NO STATISTICS)] (PATH ID: 5)
| | | Projection: public.t_super
| | | Materialize: t.c, t.n

What I am doing wrong - how to make UTYPE(M) work?

Answers

  • HibikiHibiki Employee

    You need to specify all columns in SELECT list to ORDER BY clause like the below.

    (select /*+ syntactic_join */ * from t order by c, n)
    union all /*+ UTYPE(M) */
    (select * from t order by c, n)
    order by c;
    

    But, unfortunately, the internal error happened in 11.0.1. Please let me report it to the engineering team.

  • Hibiki,
    Thanks for looking and finding internal error. I do not see errors nor in UDx logs, nor in vertica logs, seems to be you know vertica internals better than me :-)

    You are not correct about requirement of having all columns in select list to be present in order by.
    This feature were implemented on my request, I asked to have ability to efficiently merge two timeseries that are ordered on same key.
    Key include some fixed data, and date, time, nanos as last column(s) (effectively time in several columns, as Vertica do not have ability to store nanoseconds).
    Before I had to do union all and sort combined dataset, and on very large billion rows dataset sort was a bottleneck, taking 99% of time. That was very inefficient, as both source datasets already ordered on same key. Merge would be extremely efficient.
    That is exactly how it was implemented. Working construct was like this, and got around two order of magnitude performance improvement with UTYPE(M) hint, on 1 bln rows:

    (select * from t1 order by X)
    union all
    (select * from t2 order by X)
    order by X;

    See, X should not be all columns in list.

    I am very interested to make this hint working again.
    Thank you
    Sergey

  • HibikiHibiki Employee

    @Sergey_Cherepan_1 This is just information. The internal error will be fixed in the next release.

    As for the requirement, please refer to the following description.
    This option requires all input from the source queries to use the same sort order
    https://www.vertica.com/docs/11.0.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Hints/UType.htm

    So, please try to make SELECT and ORDER BY have the same column list.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.