We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Finding missing hours in a fact table. — Vertica Forum

Finding missing hours in a fact table.

What's the most efficient way to find missing data (timestamps) between two specified timestamps in a fact table? Ideally what I'd like to accomplish is the equivalent of what can be done in PostgreSQL by joining from a temporary table created using the generate_series function. e.g. WITH expected_hours AS ( SELECT s.tstamp AS expected_tstamp FROM GENERATE_SERIES ( '2013-08-07 00:00:00.000'::TIMESTAMP, '2013-08-07 23:59:59.999'::TIMESTAMP, '1 hour'::INTERVAL ) AS s(tstamp) ), fact_table AS ( SELECT s.tstamp AS tstamp FROM GENERATE_SERIES ( '2013-08-07 02:00:00.000'::TIMESTAMP, '2013-08-07 04:59:59.999'::TIMESTAMP, '1 hour'::INTERVAL ) AS s(tstamp) UNION ALL SELECT s.tstamp AS tstamp FROM GENERATE_SERIES ( '2013-08-07 06:00:00.000'::TIMESTAMP, '2013-08-07 21:59:59.999'::TIMESTAMP, '1 hour'::INTERVAL ) AS s(tstamp) ) SELECT expected_tstamp AS missing_tstamp FROM expected_hours LEFT OUTER JOIN fact_table ON (tstamp = expected_tstamp) WHERE tstamp IS NULL; missing_tstamp --------------------- 2013-08-07 00:00:00 2013-08-07 01:00:00 2013-08-07 05:00:00 2013-08-07 22:00:00 2013-08-07 23:00:00 (5 rows) I looked at TIME_SLICE and TIME_SERIES but couldn't figure out how to accomplish this. Thanks!

Comments

  • I managed to get it working, but only with nested selects: SELECT expected_tstamp AS missing_tstamp FROM ( SELECT ts AS expected_tstamp FROM ( SELECT ('2013-08-07 00:00:00.000'::timestamp) AS tstamp UNION SELECT ('2013-08-07 23:59:59.999'::timestamp) AS tstamp ) AS expected_hours TIMESERIES ts AS '1 hour' OVER (ORDER BY tstamp) ) AS expected_tstamps LEFT OUTER JOIN test fact_table ON (fact_table.tstamp = expected_tstamp) WHERE fact_table.tstamp IS NULL; missing_tstamp --------------------- 2013-08-07 00:00:00 2013-08-07 01:00:00 2013-08-07 05:00:00 2013-08-07 22:00:00 2013-08-07 23:00:00 (5 rows) When I try to use the SQL WITH clause, I get the following error: WITH expected_tstamps AS ( SELECT ts AS tstamp FROM ( SELECT ('2013-08-07 00:00:00.000'::timestamp) AS tstamp UNION SELECT ('2013-08-07 23:59:59.999'::timestamp) AS tstamp ) AS expected_hours TIMESERIES ts AS '1 hour' OVER (ORDER BY tstamp) ) SELECT tstamp FROM expected_tstamps LEFT OUTER JOIN test fact_table ON (fact_table.tstamp, 'HH24' = expected_tstamp) WHERE fact_table.tstamp IS NULL; ERROR 2624: Column "ts" does not exist
  • Is the use of TIMESERIES not fully support with the SQL WITH clause?
  • The LEFT OUTER JOIN was silly... this performs much better: SELECT expected_tstamp AS missing_tstamp FROM ( SELECT ts AS expected_tstamp FROM ( SELECT ('2013-08-07 00:00:00.000'::timestamp) AS tstamp UNION SELECT ('2013-08-07 23:59:59.999'::timestamp) AS tstamp ) AS expected_hours TIMESERIES ts AS '1 hour' OVER (ORDER BY tstamp) ) AS expected_tstamps EXCEPT SELECT DISTINCT fact_table.tstamp AS tstamp FROM test fact_table WHERE fact_table.tstamp BETWEEN '2013-08-07 00:00:00.000' AND '2013-08-07 23:59:59.999' ORDER BY missing_tstamp; Would still be nice to be able to reorganize it with the WITH CLAUSE though...

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file