Insert random values in Fact Table

franciscoCA · November 2020

Hello,
I'm trying to fill a table with values. It should pick foreign keys from the tables at random.
What happens is that every row inserted is exactly the same.

Thanks for your support.

`
insert into sales (sold_units, total_sales_amount, calendar_date_id, location_location_id, product_product_id, organization_organization_id)
select
(1 + RANDOMINT(100)) AS sold_units,
(10000 + RANDOM()*50000)::NUMERIC(7,2) AS total_sales_amount, (
select dt_id from calendar group by dt_id having dt_id>=(randomint(max(dt_id)-min(dt_id)) + min(dt_id)) limit 1) as calendar_date_id,(
select location_id from location order by random() limit 1) as location_location_id,(
select organization_id from organization order by random() limit 1) as organization_organization_id,(
select product_id from product order by random() limit 1) as product_product_id

    FROM
            ( SELECT tm FROM
            ( SELECT now() + INTERVAL '1 second' AS t UNION ALL
                    SELECT now() + INTERVAL '1000000 seconds' AS t
            ) x TIMESERIES tm AS '1 second' OVER(ORDER BY t)
    ) y

;

COMMIT;
`

marcothesane · November 2020

The fullselects used as entries in the select list are only evaluated once in the query plan, and the resulting values are then used for all rows that the query produces.

Works as designed, I'm afraid.

There is a better approach - at least I think there is.

If you use RANDOM() and similar to produce the fact table rows, there's a high probability that you also use that technique for the generation of the foreign keys. For example, if you generated 10,000 product dimension rows, and used 10,000 consecutive integers, starting with 1 for the product ID, useRANDOMINT(10000)+1` for the product foreign key in the fact table - I hope that makes sense ...

mosheg · November 2020

Does sample data use make sense here?
See: https://www.vertica.com/docs/latest/HTML/Content/Authoring/AnalyzingData/MachineLearning/DataPreparation/SamplingData.htm

\set DEMO_ROWS 1000000
DROP TABLE t1 CASCADE;
DROP TABLE t2 CASCADE;

create table public.t1 ( row_id int, f2 datetime);
create table public.t2 ( row_id int, f2 datetime);

-- Generate some data in t1
INSERT INTO t1
with myrows as (select
row_number() over() as row_id, sysdate() + RANDOMINT(365) as f2
from ( select 1 from ( select now() as se union all
select now() + :DEMO_ROWS - 1 as se) a timeseries ts as '1 day' over (order by se)) b)
select row_id, f2
from myrows
order by row_id;
COMMIT;

-- Insert random data from table t1 into table t2
INSERT INTO t2 SELECT * FROM t1 TABLESAMPLE(0.001) limit 10;
COMMIT;

SELECT * FROM t1 ORDER BY 1 limit 10;
 row_id |             f2
--------+----------------------------
      1 | 2021-02-19 09:56:55.156364
      2 | 2021-10-31 09:56:55.156364
      3 | 2021-03-11 09:56:55.156364
      4 | 2021-08-20 09:56:55.156364
      5 | 2021-01-10 09:56:55.156364
      6 | 2021-02-13 09:56:55.156364
      7 | 2021-03-09 09:56:55.156364
      8 | 2021-09-11 09:56:55.156364
      9 | 2020-12-03 09:56:55.156364
     10 | 2021-09-26 09:56:55.156364
(10 rows)

SELECT * FROM t2 ORDER BY 1;
 row_id |             f2
--------+----------------------------
 114186 | 2021-07-11 09:56:55.156364
 141080 | 2021-02-01 09:56:55.156364
 308290 | 2021-01-17 09:56:55.156364
 327098 | 2021-01-04 09:56:55.156364
 340480 | 2021-08-12 09:56:55.156364
 485222 | 2021-07-02 09:56:55.156364
 554672 | 2021-09-16 09:56:55.156364
 769993 | 2020-11-28 09:56:55.156364
 956760 | 2021-06-30 09:56:55.156364
 960165 | 2021-11-01 09:56:55.156364
(10 rows)

We're Moving!

Create My New Community Account Now

Insert random values in Fact Table

Answers

Leave a Comment