Insert random values in Fact Table
Hello,
I'm trying to fill a table with values. It should pick foreign keys from the tables at random.
What happens is that every row inserted is exactly the same.
Thanks for your support.
`
insert into sales (sold_units, total_sales_amount, calendar_date_id, location_location_id, product_product_id, organization_organization_id)
select
(1 + RANDOMINT(100)) AS sold_units,
(10000 + RANDOM()*50000)::NUMERIC(7,2) AS total_sales_amount, (
select dt_id from calendar group by dt_id having dt_id>=(randomint(max(dt_id)-min(dt_id)) + min(dt_id)) limit 1) as calendar_date_id,(
select location_id from location order by random() limit 1) as location_location_id,(
select organization_id from organization order by random() limit 1) as organization_organization_id,(
select product_id from product order by random() limit 1) as product_product_id
FROM
( SELECT tm FROM
( SELECT now() + INTERVAL '1 second' AS t UNION ALL
SELECT now() + INTERVAL '1000000 seconds' AS t
) x TIMESERIES tm AS '1 second' OVER(ORDER BY t)
) y
;
COMMIT;
`
Answers
The fullselects used as entries in the select list are only evaluated once in the query plan, and the resulting values are then used for all rows that the query produces.
Works as designed, I'm afraid.
There is a better approach - at least I think there is.
If you use
RANDOM() and similar to produce the fact table rows, there's a high probability that you also use that technique for the generation of the foreign keys. For example, if you generated 10,000 product dimension rows, and used 10,000 consecutive integers, starting with 1 for the product ID, useRANDOMINT(10000)+1` for the product foreign key in the fact table - I hope that makes sense ...Does sample data use make sense here?
See: https://www.vertica.com/docs/latest/HTML/Content/Authoring/AnalyzingData/MachineLearning/DataPreparation/SamplingData.htm
\set DEMO_ROWS 1000000 DROP TABLE t1 CASCADE; DROP TABLE t2 CASCADE; create table public.t1 ( row_id int, f2 datetime); create table public.t2 ( row_id int, f2 datetime); -- Generate some data in t1 INSERT INTO t1 with myrows as (select row_number() over() as row_id, sysdate() + RANDOMINT(365) as f2 from ( select 1 from ( select now() as se union all select now() + :DEMO_ROWS - 1 as se) a timeseries ts as '1 day' over (order by se)) b) select row_id, f2 from myrows order by row_id; COMMIT; -- Insert random data from table t1 into table t2 INSERT INTO t2 SELECT * FROM t1 TABLESAMPLE(0.001) limit 10; COMMIT; SELECT * FROM t1 ORDER BY 1 limit 10; row_id | f2 --------+---------------------------- 1 | 2021-02-19 09:56:55.156364 2 | 2021-10-31 09:56:55.156364 3 | 2021-03-11 09:56:55.156364 4 | 2021-08-20 09:56:55.156364 5 | 2021-01-10 09:56:55.156364 6 | 2021-02-13 09:56:55.156364 7 | 2021-03-09 09:56:55.156364 8 | 2021-09-11 09:56:55.156364 9 | 2020-12-03 09:56:55.156364 10 | 2021-09-26 09:56:55.156364 (10 rows) SELECT * FROM t2 ORDER BY 1; row_id | f2 --------+---------------------------- 114186 | 2021-07-11 09:56:55.156364 141080 | 2021-02-01 09:56:55.156364 308290 | 2021-01-17 09:56:55.156364 327098 | 2021-01-04 09:56:55.156364 340480 | 2021-08-12 09:56:55.156364 485222 | 2021-07-02 09:56:55.156364 554672 | 2021-09-16 09:56:55.156364 769993 | 2020-11-28 09:56:55.156364 956760 | 2021-06-30 09:56:55.156364 960165 | 2021-11-01 09:56:55.156364 (10 rows)