Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!

Inconsistency in vertica query results

I have a relatively complicated query that is supposed to calculate standard deviation of BidPrice differences within each hour of a day:

with ticks as (
select *, TickTime at time zone 'Europe/London' at time zone 'utc' as TickTimeGMT from TickHistory.Futures
where (TickType = 'Quote' and BidPrice < AskPrice)
and RIC in ('FGBLM0') and TickTime > '2020-03-19 00:00:00 UTC' and TickTime < '2020-03-20 00:00:00 UTC'
),
bars as (
SELECT TimeSlice, TS_LAST_VALUE(BidPrice, 'const') as BidPrice
from ticks
TIMESERIES TimeSlice AS '1 seconds' over (ORDER BY TickTimeGMT)
),
differences as (
SELECT TimeSlice, (LEAD(BidPrice) over (order by TimeSlice) - BidPrice) as BidPriceDiff
from bars
)
SELECT DISTINCT TIME_SLICE(TimeSlice, 3600, 'SECOND') at time zone 'Europe/London' at time zone 'utc' as BarTime,
STDDEV(BidPriceDiff) OVER (PARTITION BY TIME_SLICE(TimeSlice, 3600, 'SECOND')) as Volatility
from differences;

Everything in this query seems deterministic. Nevertheless, every time I execute this query I get slightly but noticeably different results. What could be the reason for it and is there a way to improve the query to get stable results?

Tagged:

Answers

  • Jim_KnicelyJim_Knicely Administrator

    Can you post the DDL for the TickHistory.Futures table?

  • Thanks Jim, here it is:

    CREATE TABLE TickHistory.futures
    (
    RIC varchar(50) NOT NULL,
    AliasUnderlyingRIC varchar(50),
    TickDomain varchar(25),
    TickTime timestamp NOT NULL,
    GMTOffset int,
    TickType varchar(25) NOT NULL,
    Price float,
    Volume int,
    BidPrice float,
    BidSize int,
    AskPrice float,
    AskSize int,
    Qualifiers varchar(255),
    DateTimeTicks int NOT NULL,
    InsertDate int NOT NULL,
    InsertCount int NOT NULL
    )
    PARTITION BY (((date_part('year', futures.TickTime) * 100) + date_part('month', futures.TickTime)));

  • Hi Jim,
    If a hard table is used rather than a temp the result set remains constant:-
    -- Create hard table with the data
    select * into TickHistory.Futures_STDDEV from TickHistory.Futures
    where (TickType = 'Quote' and BidPrice < AskPrice)
    and RIC in ('FGBLM0') and TickTime > '2020-03-19 00:00:00 UTC' and TickTime < '2020-03-20 00:00:00 UTC';

    select count(*) from TickHistory.Futures_STDDEV;
    1,705,710

    -- Then run STDDEV against the hard table, the numbers remain unchanged.
    with ticks as (
    select *, TickTime at time zone 'Europe/London' at time zone 'utc' as TickTimeGMT from TickHistory.Futures_STDDEV
    ),
    bars as (
    SELECT TimeSlice, TS_LAST_VALUE(BidPrice, 'const') as BidPrice
    from ticks
    TIMESERIES TimeSlice AS '1 seconds' over (ORDER BY TickTimeGMT)
    ),
    differences as (
    SELECT TimeSlice, (LEAD(BidPrice) over (order by TimeSlice) - BidPrice) as BidPriceDiff
    from bars
    )
    SELECT DISTINCT TIME_SLICE(TimeSlice, 3600, 'SECOND') at time zone 'Europe/London' at time zone 'utc' as BarTime,
    STDDEV(BidPriceDiff) OVER (PARTITION BY TIME_SLICE(TimeSlice, 3600, 'SECOND')) as Volatility
    from differences;

  • LaRampalLaRampal Employee

    Hi Tim,

    The simple design for stddev on float just isn't that stable, because float has rounding to it and so depending on the order the rows are processed you get different answers. This is very noticeable when the magitude of the deviation is small compared to the magnitude of the values going in... to the point where stddev(8+ digit constant) is not always 0.

  • Understood, thank you for the answer!

  • Or you can apply below trick to get a constant STDDEV value for your query.
    The only downside is that you need to work on adding 1 more layer of subquery.
    This may help you get consistent value of STDDEV.

    SELECT col1::int
    ,SQRT(SUM(POWER(col2-avg_analytic,2)) / (cnt_analytic-1) ) AS "STDDEV_n-1(STDDEV)"
    FROM (
    SELECT
    aaa.*
    ,AVG(col2) OVER (PARTITION BY col1) as avg_analytic
    ,COUNT(col2) OVER (PARTITION BY col1) as cnt_analytic
    from aaa
    ) bbb
    GROUP BY col1 ,cnt_analytic
    ORDER BY 1;

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.