Insert random values in Fact Table

Hello,
I'm trying to fill a table with values. It should pick foreign keys from the tables at random.
What happens is that every row inserted is exactly the same.

Thanks for your support.

`
insert into sales (sold_units, total_sales_amount, calendar_date_id, location_location_id, product_product_id, organization_organization_id)
select
(1 + RANDOMINT(100)) AS sold_units,
(10000 + RANDOM()*50000)::NUMERIC(7,2) AS total_sales_amount, (
select dt_id from calendar group by dt_id having dt_id>=(randomint(max(dt_id)-min(dt_id)) + min(dt_id)) limit 1) as calendar_date_id,(
select location_id from location order by random() limit 1) as location_location_id,(
select organization_id from organization order by random() limit 1) as organization_organization_id,(
select product_id from product order by random() limit 1) as product_product_id

    FROM
            ( SELECT tm FROM
            ( SELECT now() + INTERVAL '1 second' AS t UNION ALL
                    SELECT now() + INTERVAL '1000000 seconds' AS t
            ) x TIMESERIES tm AS '1 second' OVER(ORDER BY t)
    ) y

;

COMMIT;
`

Answers

  • marcothesanemarcothesane - Select Field - Administrator

    The fullselects used as entries in the select list are only evaluated once in the query plan, and the resulting values are then used for all rows that the query produces.

    Works as designed, I'm afraid.

    There is a better approach - at least I think there is.

    If you use RANDOM() and similar to produce the fact table rows, there's a high probability that you also use that technique for the generation of the foreign keys. For example, if you generated 10,000 product dimension rows, and used 10,000 consecutive integers, starting with 1 for the product ID, useRANDOMINT(10000)+1` for the product foreign key in the fact table - I hope that makes sense ...

  • moshegmosheg Vertica Employee Administrator

    Does sample data use make sense here?
    See: https://www.vertica.com/docs/latest/HTML/Content/Authoring/AnalyzingData/MachineLearning/DataPreparation/SamplingData.htm

    \set DEMO_ROWS 1000000
    DROP TABLE t1 CASCADE;
    DROP TABLE t2 CASCADE;
    
    create table public.t1 ( row_id int, f2 datetime);
    create table public.t2 ( row_id int, f2 datetime);
    
    -- Generate some data in t1
    INSERT INTO t1
    with myrows as (select
    row_number() over() as row_id, sysdate() + RANDOMINT(365) as f2
    from ( select 1 from ( select now() as se union all
    select now() + :DEMO_ROWS - 1 as se) a timeseries ts as '1 day' over (order by se)) b)
    select row_id, f2
    from myrows
    order by row_id;
    COMMIT;
    
    -- Insert random data from table t1 into table t2
    INSERT INTO t2 SELECT * FROM t1 TABLESAMPLE(0.001) limit 10;
    COMMIT;
    
    SELECT * FROM t1 ORDER BY 1 limit 10;
     row_id |             f2
    --------+----------------------------
          1 | 2021-02-19 09:56:55.156364
          2 | 2021-10-31 09:56:55.156364
          3 | 2021-03-11 09:56:55.156364
          4 | 2021-08-20 09:56:55.156364
          5 | 2021-01-10 09:56:55.156364
          6 | 2021-02-13 09:56:55.156364
          7 | 2021-03-09 09:56:55.156364
          8 | 2021-09-11 09:56:55.156364
          9 | 2020-12-03 09:56:55.156364
         10 | 2021-09-26 09:56:55.156364
    (10 rows)
    
    SELECT * FROM t2 ORDER BY 1;
     row_id |             f2
    --------+----------------------------
     114186 | 2021-07-11 09:56:55.156364
     141080 | 2021-02-01 09:56:55.156364
     308290 | 2021-01-17 09:56:55.156364
     327098 | 2021-01-04 09:56:55.156364
     340480 | 2021-08-12 09:56:55.156364
     485222 | 2021-07-02 09:56:55.156364
     554672 | 2021-09-16 09:56:55.156364
     769993 | 2020-11-28 09:56:55.156364
     956760 | 2021-06-30 09:56:55.156364
     960165 | 2021-11-01 09:56:55.156364
    (10 rows)
    

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file