Options

Finding missing hours in a fact table.

What's the most efficient way to find missing data (timestamps) between two specified timestamps in a fact table? Ideally what I'd like to accomplish is the equivalent of what can be done in PostgreSQL by joining from a temporary table created using the generate_series function. e.g. WITH expected_hours AS ( SELECT s.tstamp AS expected_tstamp FROM GENERATE_SERIES ( '2013-08-07 00:00:00.000'::TIMESTAMP, '2013-08-07 23:59:59.999'::TIMESTAMP, '1 hour'::INTERVAL ) AS s(tstamp) ), fact_table AS ( SELECT s.tstamp AS tstamp FROM GENERATE_SERIES ( '2013-08-07 02:00:00.000'::TIMESTAMP, '2013-08-07 04:59:59.999'::TIMESTAMP, '1 hour'::INTERVAL ) AS s(tstamp) UNION ALL SELECT s.tstamp AS tstamp FROM GENERATE_SERIES ( '2013-08-07 06:00:00.000'::TIMESTAMP, '2013-08-07 21:59:59.999'::TIMESTAMP, '1 hour'::INTERVAL ) AS s(tstamp) ) SELECT expected_tstamp AS missing_tstamp FROM expected_hours LEFT OUTER JOIN fact_table ON (tstamp = expected_tstamp) WHERE tstamp IS NULL; missing_tstamp --------------------- 2013-08-07 00:00:00 2013-08-07 01:00:00 2013-08-07 05:00:00 2013-08-07 22:00:00 2013-08-07 23:00:00 (5 rows) I looked at TIME_SLICE and TIME_SERIES but couldn't figure out how to accomplish this. Thanks!

Comments

  • Options
    I managed to get it working, but only with nested selects: SELECT expected_tstamp AS missing_tstamp FROM ( SELECT ts AS expected_tstamp FROM ( SELECT ('2013-08-07 00:00:00.000'::timestamp) AS tstamp UNION SELECT ('2013-08-07 23:59:59.999'::timestamp) AS tstamp ) AS expected_hours TIMESERIES ts AS '1 hour' OVER (ORDER BY tstamp) ) AS expected_tstamps LEFT OUTER JOIN test fact_table ON (fact_table.tstamp = expected_tstamp) WHERE fact_table.tstamp IS NULL; missing_tstamp --------------------- 2013-08-07 00:00:00 2013-08-07 01:00:00 2013-08-07 05:00:00 2013-08-07 22:00:00 2013-08-07 23:00:00 (5 rows) When I try to use the SQL WITH clause, I get the following error: WITH expected_tstamps AS ( SELECT ts AS tstamp FROM ( SELECT ('2013-08-07 00:00:00.000'::timestamp) AS tstamp UNION SELECT ('2013-08-07 23:59:59.999'::timestamp) AS tstamp ) AS expected_hours TIMESERIES ts AS '1 hour' OVER (ORDER BY tstamp) ) SELECT tstamp FROM expected_tstamps LEFT OUTER JOIN test fact_table ON (fact_table.tstamp, 'HH24' = expected_tstamp) WHERE fact_table.tstamp IS NULL; ERROR 2624: Column "ts" does not exist
  • Options
    Is the use of TIMESERIES not fully support with the SQL WITH clause?
  • Options
    The LEFT OUTER JOIN was silly... this performs much better: SELECT expected_tstamp AS missing_tstamp FROM ( SELECT ts AS expected_tstamp FROM ( SELECT ('2013-08-07 00:00:00.000'::timestamp) AS tstamp UNION SELECT ('2013-08-07 23:59:59.999'::timestamp) AS tstamp ) AS expected_hours TIMESERIES ts AS '1 hour' OVER (ORDER BY tstamp) ) AS expected_tstamps EXCEPT SELECT DISTINCT fact_table.tstamp AS tstamp FROM test fact_table WHERE fact_table.tstamp BETWEEN '2013-08-07 00:00:00.000' AND '2013-08-07 23:59:59.999' ORDER BY missing_tstamp; Would still be nice to be able to reorganize it with the WITH CLAUSE though...

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file