After JOIN statements run result something wrong when using Docker on M1 MacBook

liquidbearliquidbear Vertica Customer
edited February 2023 in General Discussion
  • OS : macOS Monterey (12.3)
  • CPU : M1 Pro
  • RAM : 32GB
  • Docker Image & version : vertica/vertica-ce:12.0.3-0

I am currently running a "verticaDB" using docker image on colima.

Thanks to the link below, the operation was successful.

https://forum.vertica.com/discussion/242629/support-apple-silicon-m1-for-docker-vertica-ce

However, when using JOIN statements, you can see that the result value is strange.

Below is the result of the first query.

SELECT m.id, b.id FROM schema.member m INNER JOIN schema.board b

 mall_id | mall_id
---------+---------
      10 |      10
      20 |      20
      30 |      30
      40 |      40
      50 |      50
      60 |      60
      70 |      70
      80 |      80
      90 |      90
(9 rows)

This result is a valid result.
However, if run the query again, you can see that the garbage value is generated.

mall_id | mall_id
---------+---------
      10 |      10
      10 |      20 <--- garbage value
      20 |      20
      20 |      30 <--- garbage value
      30 |      30
      30 |      40 <--- garbage value
      40 |      40
      40 |      50 <--- garbage value
      50 |      50
      50 |      60 <--- garbage value
      60 |      60 
      60 |      70 <--- garbage value
      70 |      70 
      70 |      80 <--- garbage value
      80 |      80
      80 |      90 <--- garbage value
      90 |      90
      90 | 1833761 <--- garbage value
(18 rows)

You can see that strange values are included as shown above, and if do repeated, it will be treated as follows.

mall_id |       mall_id
---------+---------------------
      10 |                  10
      10 |                  20
      10 |                  30
      20 |                  20
      20 |                  30
      20 |                  40
      30 |                  30
      30 |                  40
      30 |                  50
      40 |                  40
      40 |                  50
      40 |                  60
      50 |                  50
      50 |                  60
      50 |                  70
      60 |                  60
      60 |                  70
      60 |                  80
      70 |                  70
      70 |                  80
      70 |                  90
      80 |                  80
      80 |                  90
      80 | 8028074750225091121
      90 |                  90
      90 | 8028074750225091121
      90 | 7304687113783041902
(27 rows)

However, if you look at the results of the execution of Query execution explain, it seems normal.

EXPLAIN SELECT m.id, b.id FROM schema.member m INNER JOIN schema.board b

 Access Path:
 +-JOIN HASH [Cost: 291, Rows: 9] (PATH ID: 1)
 |  Join Cond: (m.id = b.id)
 | +-- Outer -> STORAGE ACCESS for m [Cost: 145, Rows: 9] (PATH ID: 2)
 | |      Projection: schema.member_super
 | |      Materialize: m.id
 | |      Runtime Filter: (SIP1(HashJoin): m.id)
 | +-- Inner -> STORAGE ACCESS for b [Cost: 145, Rows: 9] (PATH ID: 3)
 | |      Projection: schema.board_super
 | |      Materialize: b.id

To deal with this part, i rebuilt the docker image by referring to the official docker image repository below.
https://github.com/vertica/vertica-containers

Both the rpm package and the deb package were rebuilt with amd64 arch, but it was confirmed that the following error was spitting out.

vsql: could not connect to server: Cannot assign requested address
     Is the server running on host "???" and accepting
     TCP/IP connections on port 5433?

At this time, colima was set to use qemu-system-x86_64, and it was confirmed that the architecture was used normally in docker info or colima status.

❯ colima status -p amd64
INFO[0000] colima [profile=amd64] is running using QEMU 
INFO[0000] arch: x86_64                                 
INFO[0000] runtime: docker                              
INFO[0000] mountType: sshfs                             
INFO[0000] address: 192.168.106.3                       
INFO[0000] socket: unix:///Users/me/.colima/amd64/docker.sock 

❯ docker info                    
Client:
 Context:    colima-amd64
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc., v0.10.2)
...
Kernel Version: 5.15.82-0-virt
 Operating System: Alpine Linux v3.16
 OSType: linux
 Architecture: x86_64

I couldn't solve this problem because of the above error, and i'm using the official image by writing the docker-compose file as below.

vertica:
    image: vertica/vertica-ce:12.0.3-0
    container_name: vertica
    platform: linux/amd64
    environment:
      VERTICA_DB_USER: "dbadmin"
      VERTICA_DB_PASSWORD: ""
      VERTICA_DB_NAME: "dev"
      VERTICA_MEMDEBUG: 2
      TZ: "Asia/Seoul"
    volumes:
      - ./vertica/init-scripts:/docker-entrypoint-initdb.d
    ports:
      - "5433:5433"
      - "5444:5444"

If there are people like me who have experienced the problem of garbage value when using JOIN statement, I wonder how you solved it.

Thanks for in advance.

Tagged:

Answers

  • moshegmosheg Vertica Employee Administrator

    Please share the join statement and create tables DDL repo.

    SELECT m.id, b.id FROM schema.member m INNER JOIN schema.board b
    Produces Syntax error.

    Please consider something like the following example:

    create table member (id    int);
    create table board (id    int);
    
    \set DEMO_ROWS 10
    INSERT INTO member
    with myrows as (select
    row_number() over() as id
    from ( select 1 from ( select now() as se union all
    select now() + :DEMO_ROWS - 1 as se) a timeseries ts as '1 day' over (order by se)) b)
    select id
    from myrows
    order by id;
    COMMIT;
    
    SELECT COPY_TABLE ('public.member', 'public.board');
    
    SELECT m.id, b.id
    FROM member m
    INNER JOIN board b
    ON m.id=b.id
    ORDER BY 1;
    
  • liquidbearliquidbear Vertica Customer
    edited February 2023

    -

  • liquidbearliquidbear Vertica Customer
    edited February 2023

    @mosheg

    First of all, thank you for the detailed example.

    However, the DDL you wrote above is working normally, but my DDL is not working properly yet.

    I am attaching a sample DDL to you.

    drop table member_test;
    drop table board_test;
    
    create table member_test(
     member_id int not null primary key
    );
    
    
    create table board_test(
     board_id varchar(32) not null primary key,
     member_id int
    );
    
    START TRANSACTION;
    INSERT INTO member_test(member_id) VALUES(1);
    INSERT INTO member_test(member_id) VALUES(2);
    INSERT INTO member_test(member_id) VALUES(3);
    COMMIT;
    
    START TRANSACTION;
    INSERT INTO board_test(board_id, member_id) VALUES ('W5pJt6mH9z5q2HSlcm1N9W4y43jzgoX8', 1);
    INSERT INTO board_test(board_id, member_id) VALUES ('nWc2nb4r2JOukPs4aIwtpmIaH7Q6rpr5', 2);
    INSERT INTO board_test(board_id, member_id)VALUES ('8sEvAOzV4sdqguWpAzkkEwGyxrDpO6Ez', 3);
    COMMIT;
    
    SELECT *
    FROM member_test m
    INNER JOIN board_test b
    ON m.member_id=b.member_id
    ORDER BY 1;
    

    The following is the result of performing the sample DDL as shown above.

    First Run Query

    dbadmin@06421830995b()=> SELECT *
    epdb_dev-> FROM member_test m
    epdb_dev-> INNER JOIN board_test b
    epdb_dev-> ON m.member_id=b.member_id;
     member_id |             board_id             | member_id
    -----------+----------------------------------+-----------
             1 | W5pJt6mH9z5q2HSlcm1N9W4y43jzgoX8 |         1
             2 | nWc2nb4r2JOukPs4aIwtpmIaH7Q6rpr5 |         2
             3 | 8sEvAOzV4sdqguWpAzkkEwGyxrDpO6Ez |         3
    (3 rows)
    

    Second Run Query

    dbadmin@06421830995b(*)=> SELECT *
    epdb_dev-> FROM member_test m
    epdb_dev-> INNER JOIN board_test b
    epdb_dev-> ON m.member_id=b.member_id;
    ERROR 3149:  Duplicate primary/unique key detected in join [(public.board_test x public.member_test) using board_test_super and member_test_super (PATH ID: 1)]; value [member_id=1]
    

    Query Explain

     ------------------------------
     QUERY PLAN DESCRIPTION:
     ------------------------------
    
     EXPLAIN SELECT *
     FROM member_test m
     INNER JOIN board_test b
     ON m.member_id=b.member_id
     ;
    
     Access Path:
     +-JOIN HASH [Cost: 135, Rows: 3] (PATH ID: 1)
     |  Join Cond: (m.member_id = b.member_id)
     |  Materialize at Output: b.board_id
     | +-- Outer -> STORAGE ACCESS for b [Cost: 49, Rows: 3] (PATH ID: 2)
     | |      Projection: public.board_test_super
     | |      Materialize: b.member_id
     | |      Filter: (b.member_id IS NOT NULL)
     | |      Runtime Filter: (SIP1(HashJoin): b.member_id)
     | +-- Inner -> STORAGE ACCESS for m [Cost: 49, Rows: 3] (PATH ID: 3)
     | |      Projection: public.member_test_super
     | |      Materialize: m.member_id
    

    In this case, the above error appears, and the case where the actual garbage value is generated can be reproduced by using the DDL as below.

    drop table member;
    drop table board;
    
    create table member(
     member_id         int not null
    );
    
    
    create table board(
     board_id         varchar(32) not null,
     member_id int
    );
    
    
    START TRANSACTION;
    INSERT INTO member(member_id) VALUES(1);
    INSERT INTO member(member_id) VALUES(2);
    INSERT INTO member(member_id) VALUES(3);
    COMMIT;
    
    START TRANSACTION;
    INSERT INTO board(board_id, member_id) VALUES ('W5pJt6mH9z5q2HSlcm1N9W4y43jzgoX8', 1);
    INSERT INTO board(board_id, member_id) VALUES ('nWc2nb4r2JOukPs4aIwtpmIaH7Q6rpr5', 2);
    INSERT INTO board(board_id, member_id)VALUES ('8sEvAOzV4sdqguWpAzkkEwGyxrDpO6Ez', 3);
    COMMIT;
    
    
    SELECT *
    FROM member m
    INNER JOIN board b
    ON m.member_id=b.member_id;
    ORDER BY 1;
    

    Result

    dbadmin@354ddc2c3d8a()=> SELECT *
    epdb_dev-> FROM member m
    epdb_dev-> INNER JOIN board b
    epdb_dev-> ON m.member_id=b.member_id;
          member_id      |             board_id             | member_id
    ---------------------+----------------------------------+-----------
                       1 | W5pJt6mH9z5q2HSlcm1N9W4y43jzgoX8 |         1
                       2 | W5pJt6mH9z5q2HSlcm1N9W4y43jzgoX8 |         1
                       3 | W5pJt6mH9z5q2HSlcm1N9W4y43jzgoX8 |         1
     7883960210924265844 | W5pJt6mH9z5q2HSlcm1N9W4y43jzgoX8 |         1
     3906083473050778213 | W5pJt6mH9z5q2HSlcm1N9W4y43jzgoX8 |         1
     4121135822717137972 | W5pJt6mH9z5q2HSlcm1N9W4y43jzgoX8 |         1
     7953186949000202802 | W5pJt6mH9z5q2HSlcm1N9W4y43jzgoX8 |         1
                       2 | nWc2nb4r2JOukPs4aIwtpmIaH7Q6rpr5 |         2
                       3 | nWc2nb4r2JOukPs4aIwtpmIaH7Q6rpr5 |         2
     7883960210924265844 | nWc2nb4r2JOukPs4aIwtpmIaH7Q6rpr5 |         2
     3906083473050778213 | nWc2nb4r2JOukPs4aIwtpmIaH7Q6rpr5 |         2
     4121135822717137972 | nWc2nb4r2JOukPs4aIwtpmIaH7Q6rpr5 |         2
     7953186949000202802 | nWc2nb4r2JOukPs4aIwtpmIaH7Q6rpr5 |         2
     8098984466286210401 | nWc2nb4r2JOukPs4aIwtpmIaH7Q6rpr5 |         2
                       3 | 8sEvAOzV4sdqguWpAzkkEwGyxrDpO6Ez |         3
     7883960210924265844 | 8sEvAOzV4sdqguWpAzkkEwGyxrDpO6Ez |         3
     3906083473050778213 | 8sEvAOzV4sdqguWpAzkkEwGyxrDpO6Ez |         3
     4121135822717137972 | 8sEvAOzV4sdqguWpAzkkEwGyxrDpO6Ez |         3
     7953186949000202802 | 8sEvAOzV4sdqguWpAzkkEwGyxrDpO6Ez |         3
     8098984466286210401 | 8sEvAOzV4sdqguWpAzkkEwGyxrDpO6Ez |         3
     7953205644807332452 | 8sEvAOzV4sdqguWpAzkkEwGyxrDpO6Ez |         3
    (21 rows)
    

    Could you tell me how to solve this problem?

  • VValdarVValdar Vertica Employee Employee

    Hi @liquidbear ,

    This is a wild guess, can you try your query this way?

    SELECT *
    FROM member m
    INNER JOIN board b /*+ EARLY_MATERIALIZATION */
    ON m.member_id=b.member_id;
    

    Sorry I don't have a M1 macbook to test by myself.

  • liquidbearliquidbear Vertica Customer

    @VValdar

    Thank you for your advice.

    However, we confirmed that the same problem occurred.

    Result

    epdb_dev-> FROM member m
    epdb_dev-> INNER JOIN board b /*+ EARLY_MATERIALIZATION */
    epdb_dev-> ON m.member_id=b.member_id;
     member_id |             board_id             | member_id
    -----------+----------------------------------+-----------
             1 | W5pJt6mH9z5q2HSlcm1N9W4y43jzgoX8 |         1
             2 | nWc2nb4r2JOukPs4aIwtpmIaH7Q6rpr5 |         2
             3 | 8sEvAOzV4sdqguWpAzkkEwGyxrDpO6Ez |         3
    (3 rows)
    
    
    dbadmin@0dd25399f0a7(*)=> SELECT *
    epdb_dev-> FROM member m
    epdb_dev-> INNER JOIN board b /*+ EARLY_MATERIALIZATION */
    epdb_dev-> ON m.member_id=b.member_id;
     member_id |             board_id             | member_id
    -----------+----------------------------------+-----------
             1 | W5pJt6mH9z5q2HSlcm1N9W4y43jzgoX8 |         1
             2 | W5pJt6mH9z5q2HSlcm1N9W4y43jzgoX8 |         1
             2 | nWc2nb4r2JOukPs4aIwtpmIaH7Q6rpr5 |         2
             3 | nWc2nb4r2JOukPs4aIwtpmIaH7Q6rpr5 |         2
             3 | 8sEvAOzV4sdqguWpAzkkEwGyxrDpO6Ez |         3
             0 | 8sEvAOzV4sdqguWpAzkkEwGyxrDpO6Ez |         3
    (6 rows)
    
  • moshegmosheg Vertica Employee Administrator

    On v12.0.2-0 I've got the error in the 2nd query run, only when some of the projections from previous run were not dropped:
    ERROR 3149: Duplicate primary/unique key detected in join [(public.board_test x public.member_test) using board_test_super and member_test_super (PATH ID: 1)]; value [member_id=1]
    Please add CASCADE to all "drop" statements.
    And before running the test again please verify that all your tables and all projections are dropped.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file