Vertica pattern match feature
Hi, I found Vertica pattern match was pretty useful, I have tested a few case and got expected result.
but I was puzzled this sql, maybe you guys can give me some hints.
date pct pre_val baseline t_value (baseline1.05) event_name match_id
9/5/2019 2.60% 1.26% 2.09% 2.19% start 1
9/6/2019 2.61% 2.60% 2.09% 2.19% start 2
9/7/2019 2.60% 2.61% 2.09% 2.19% up 3
9/8/2019 2.55% 2.60% 2.09% 2.19% up 4
9/9/2019 2.54% 2.55% 2.09% 2.19% up 5
--pattern match statement
DEFINE
start as down_pct > pre_val,
up as median_ratio is not null and down_pct > baseline1.05, --t_value
PATTERN
test as (start up+ )
ROWS MATCH first EVENT;
I got the pattern from 9/6 to 9/9, but from data, it can clearly see the pattern start at 9/05.
Did I miss anything?
if I used: test as (start start up+ ) , I saw the pattern would start 09/05.
Thanks in advance!
-David
Best Answers
-
marcothesane - Select Field - Administrator
As you have no column named
median_ratio
in your sample data, I can only guess.
But in this pattern, you already supply event names and match id, so I won't bother.
But the event series is astart
, then anotherstart
, and then a series of 3up
events.So
(start up+)
is one eventstart
followed by severalup
events. This makes it start at2019-09-06
, and the row of2019-09-05
falls out of the pattern.If you want the row of
2019-09-05
to be part of the pattern, it's either(start start up+)
or(start+ up+)
- or also(start up*)
to get two patterns, of which one consists of a single row.
Hope this helps ...1 -
marcothesane - Select Field - Administrator
The order of action is:
1. For each row, check each of the events in the DEFINE clause. With the default, ROWS MATCH FIRST EVENT, stop checking as soon as one event's Boolean expression is true. With that, both rows of 2019-09-05 and 2019-09-06 get the event_namestart
: Given thatdown_pct > pre_val
is true in both cases, theup
event is not even evaluated.
2. Check the pattern specified. In our case, initially:(start up+)
. That means one occurrence ofstart
, not two, followed by one or more occurrences ofup
.This leaves no other option than to let the row with 2019-09-05 fall outside of the pattern you're searching for.
5 -
marcothesane - Select Field - Administrator
(start up*)
is one ofstart
, followed by zero, one or moreup
. So it can be just astart
line.
You are getting two patterns, a one-lined one, and another with more lines. Check the output of thepattern_id()
function here below, and note that you're getting two pattern ID-s ...SELECT * , event_name() , pattern_id() , match_id() FROM input MATCH( PARTITION BY 1 ORDER BY dt DEFINE start AS down_pct > pre_val , up AS down_pct > baseline*1.05 PATTERN p AS (start up*) ); -- out dt | down_pct | pre_val | baseline | t_value | event_name | pattern_id | match_id -- out ------------+----------+---------+----------+---------+------------+------------+---------- -- out 2019-09-05 | 2.60 | 1.26 | 2.09 | 2.19 | start | 1 | 1 -- out 2019-09-06 | 2.61 | 2.60 | 2.09 | 2.19 | start | 2 | 1 -- out 2019-09-07 | 2.60 | 2.61 | 2.09 | 2.19 | up | 2 | 2 -- out 2019-09-08 | 2.55 | 2.60 | 2.09 | 2.19 | up | 2 | 3 -- out 2019-09-09 | 2.54 | 2.55 | 2.09 | 2.19 | up | 2 | 4 -- out (5 rows) -- out -- out Time: First fetch (5 rows): 65.666 ms. All rows formatted: 65.771 ms
5
Answers
thanks, Marcothesne!
I am still not quite understand , why 2019-09-05 and 2019-09-06 both are been treated as "Start" event, why not "2019-09-05" as "Start" and "2019-09-06" as "up" event? both of them fulfilled: start as down_pct > pre_val, up as down_pct > baseline*1.05.
I thought I used "ROWS MATCH first EVENT", it should pick "09/05" as start and "09/06" as up event?
ROWS MATCH FIRST EVENT means that the first Boolean expression, in the events list, that evaluates to TRUE is picked, and the following ones won't even be evaluated. ROWS MATCH ALL EVENTS would cause an error if two Booleans evaluate to TRUE:
ERROR: pattern events must be mutually exclusive
HINT: try using ROWS MATCH FIRST EVENT
agree..but if I used: "ROWS MATCH first EVENT", it should pick "09/05" as start and "09/06" as up event?
sorry, I am still not get why "09/05" did not get pick?
ROWS MATCH FIRST EVENT refers to the events in the DEFINE clause, in the order in which you specify them in there, not to the events in the pattern you're searching.
great! I got it now..
one more question: start up* and start up+ different?
why start up* can catch this pattern?
thanks, now I totally understand...I thought Vertica treat "start,up" as a single pattern, that is the reason I got puzzled.
if Vertica treated "start" and "up" into two different events, now it is clear.
great support!