After JOIN statements run result something wrong when using Docker on M1 MacBook
- OS : macOS Monterey (12.3)
- CPU : M1 Pro
- RAM : 32GB
- Docker Image & version : vertica/vertica-ce:12.0.3-0
I am currently running a "verticaDB" using docker image on colima.
Thanks to the link below, the operation was successful.
https://forum.vertica.com/discussion/242629/support-apple-silicon-m1-for-docker-vertica-ce
However, when using JOIN statements, you can see that the result value is strange.
Below is the result of the first query.
SELECT m.id, b.id FROM schema.member m INNER JOIN schema.board b
mall_id | mall_id
---------+---------
10 | 10
20 | 20
30 | 30
40 | 40
50 | 50
60 | 60
70 | 70
80 | 80
90 | 90
(9 rows)
This result is a valid result.
However, if run the query again, you can see that the garbage value is generated.
mall_id | mall_id
---------+---------
10 | 10
10 | 20 <--- garbage value
20 | 20
20 | 30 <--- garbage value
30 | 30
30 | 40 <--- garbage value
40 | 40
40 | 50 <--- garbage value
50 | 50
50 | 60 <--- garbage value
60 | 60
60 | 70 <--- garbage value
70 | 70
70 | 80 <--- garbage value
80 | 80
80 | 90 <--- garbage value
90 | 90
90 | 1833761 <--- garbage value
(18 rows)
You can see that strange values are included as shown above, and if do repeated, it will be treated as follows.
mall_id | mall_id
---------+---------------------
10 | 10
10 | 20
10 | 30
20 | 20
20 | 30
20 | 40
30 | 30
30 | 40
30 | 50
40 | 40
40 | 50
40 | 60
50 | 50
50 | 60
50 | 70
60 | 60
60 | 70
60 | 80
70 | 70
70 | 80
70 | 90
80 | 80
80 | 90
80 | 8028074750225091121
90 | 90
90 | 8028074750225091121
90 | 7304687113783041902
(27 rows)
However, if you look at the results of the execution of Query execution explain, it seems normal.
EXPLAIN SELECT m.id, b.id FROM schema.member m INNER JOIN schema.board b Access Path: +-JOIN HASH [Cost: 291, Rows: 9] (PATH ID: 1) | Join Cond: (m.id = b.id) | +-- Outer -> STORAGE ACCESS for m [Cost: 145, Rows: 9] (PATH ID: 2) | | Projection: schema.member_super | | Materialize: m.id | | Runtime Filter: (SIP1(HashJoin): m.id) | +-- Inner -> STORAGE ACCESS for b [Cost: 145, Rows: 9] (PATH ID: 3) | | Projection: schema.board_super | | Materialize: b.id
To deal with this part, i rebuilt the docker image by referring to the official docker image repository below.
https://github.com/vertica/vertica-containers
Both the rpm package and the deb package were rebuilt with amd64 arch, but it was confirmed that the following error was spitting out.
vsql: could not connect to server: Cannot assign requested address
Is the server running on host "???" and accepting
TCP/IP connections on port 5433?
At this time, colima was set to use qemu-system-x86_64, and it was confirmed that the architecture was used normally in docker info or colima status.
❯ colima status -p amd64 INFO[0000] colima [profile=amd64] is running using QEMU INFO[0000] arch: x86_64 INFO[0000] runtime: docker INFO[0000] mountType: sshfs INFO[0000] address: 192.168.106.3 INFO[0000] socket: unix:///Users/me/.colima/amd64/docker.sock ❯ docker info Client: Context: colima-amd64 Debug Mode: false Plugins: buildx: Docker Buildx (Docker Inc., v0.10.2) ... Kernel Version: 5.15.82-0-virt Operating System: Alpine Linux v3.16 OSType: linux Architecture: x86_64
I couldn't solve this problem because of the above error, and i'm using the official image by writing the docker-compose file as below.
vertica:
image: vertica/vertica-ce:12.0.3-0
container_name: vertica
platform: linux/amd64
environment:
VERTICA_DB_USER: "dbadmin"
VERTICA_DB_PASSWORD: ""
VERTICA_DB_NAME: "dev"
VERTICA_MEMDEBUG: 2
TZ: "Asia/Seoul"
volumes:
- ./vertica/init-scripts:/docker-entrypoint-initdb.d
ports:
- "5433:5433"
- "5444:5444"
If there are people like me who have experienced the problem of garbage value when using JOIN statement, I wonder how you solved it.
Thanks for in advance.
Answers
Please share the join statement and create tables DDL repo.
SELECT m.id, b.id FROM schema.member m INNER JOIN schema.board b
Produces Syntax error.
Please consider something like the following example:
create table member (id int); create table board (id int); \set DEMO_ROWS 10 INSERT INTO member with myrows as (select row_number() over() as id from ( select 1 from ( select now() as se union all select now() + :DEMO_ROWS - 1 as se) a timeseries ts as '1 day' over (order by se)) b) select id from myrows order by id; COMMIT; SELECT COPY_TABLE ('public.member', 'public.board'); SELECT m.id, b.id FROM member m INNER JOIN board b ON m.id=b.id ORDER BY 1;-
@mosheg
First of all, thank you for the detailed example.
However, the DDL you wrote above is working normally, but my DDL is not working properly yet.
I am attaching a sample DDL to you.
drop table member_test; drop table board_test; create table member_test( member_id int not null primary key ); create table board_test( board_id varchar(32) not null primary key, member_id int ); START TRANSACTION; INSERT INTO member_test(member_id) VALUES(1); INSERT INTO member_test(member_id) VALUES(2); INSERT INTO member_test(member_id) VALUES(3); COMMIT; START TRANSACTION; INSERT INTO board_test(board_id, member_id) VALUES ('W5pJt6mH9z5q2HSlcm1N9W4y43jzgoX8', 1); INSERT INTO board_test(board_id, member_id) VALUES ('nWc2nb4r2JOukPs4aIwtpmIaH7Q6rpr5', 2); INSERT INTO board_test(board_id, member_id)VALUES ('8sEvAOzV4sdqguWpAzkkEwGyxrDpO6Ez', 3); COMMIT; SELECT * FROM member_test m INNER JOIN board_test b ON m.member_id=b.member_id ORDER BY 1;The following is the result of performing the sample DDL as shown above.
First Run Query
dbadmin@06421830995b()=> SELECT * epdb_dev-> FROM member_test m epdb_dev-> INNER JOIN board_test b epdb_dev-> ON m.member_id=b.member_id; member_id | board_id | member_id -----------+----------------------------------+----------- 1 | W5pJt6mH9z5q2HSlcm1N9W4y43jzgoX8 | 1 2 | nWc2nb4r2JOukPs4aIwtpmIaH7Q6rpr5 | 2 3 | 8sEvAOzV4sdqguWpAzkkEwGyxrDpO6Ez | 3 (3 rows)Second Run Query
Query Explain
In this case, the above error appears, and the case where the actual garbage value is generated can be reproduced by using the DDL as below.
drop table member; drop table board; create table member( member_id int not null ); create table board( board_id varchar(32) not null, member_id int ); START TRANSACTION; INSERT INTO member(member_id) VALUES(1); INSERT INTO member(member_id) VALUES(2); INSERT INTO member(member_id) VALUES(3); COMMIT; START TRANSACTION; INSERT INTO board(board_id, member_id) VALUES ('W5pJt6mH9z5q2HSlcm1N9W4y43jzgoX8', 1); INSERT INTO board(board_id, member_id) VALUES ('nWc2nb4r2JOukPs4aIwtpmIaH7Q6rpr5', 2); INSERT INTO board(board_id, member_id)VALUES ('8sEvAOzV4sdqguWpAzkkEwGyxrDpO6Ez', 3); COMMIT; SELECT * FROM member m INNER JOIN board b ON m.member_id=b.member_id; ORDER BY 1;Result
dbadmin@354ddc2c3d8a()=> SELECT * epdb_dev-> FROM member m epdb_dev-> INNER JOIN board b epdb_dev-> ON m.member_id=b.member_id; member_id | board_id | member_id ---------------------+----------------------------------+----------- 1 | W5pJt6mH9z5q2HSlcm1N9W4y43jzgoX8 | 1 2 | W5pJt6mH9z5q2HSlcm1N9W4y43jzgoX8 | 1 3 | W5pJt6mH9z5q2HSlcm1N9W4y43jzgoX8 | 1 7883960210924265844 | W5pJt6mH9z5q2HSlcm1N9W4y43jzgoX8 | 1 3906083473050778213 | W5pJt6mH9z5q2HSlcm1N9W4y43jzgoX8 | 1 4121135822717137972 | W5pJt6mH9z5q2HSlcm1N9W4y43jzgoX8 | 1 7953186949000202802 | W5pJt6mH9z5q2HSlcm1N9W4y43jzgoX8 | 1 2 | nWc2nb4r2JOukPs4aIwtpmIaH7Q6rpr5 | 2 3 | nWc2nb4r2JOukPs4aIwtpmIaH7Q6rpr5 | 2 7883960210924265844 | nWc2nb4r2JOukPs4aIwtpmIaH7Q6rpr5 | 2 3906083473050778213 | nWc2nb4r2JOukPs4aIwtpmIaH7Q6rpr5 | 2 4121135822717137972 | nWc2nb4r2JOukPs4aIwtpmIaH7Q6rpr5 | 2 7953186949000202802 | nWc2nb4r2JOukPs4aIwtpmIaH7Q6rpr5 | 2 8098984466286210401 | nWc2nb4r2JOukPs4aIwtpmIaH7Q6rpr5 | 2 3 | 8sEvAOzV4sdqguWpAzkkEwGyxrDpO6Ez | 3 7883960210924265844 | 8sEvAOzV4sdqguWpAzkkEwGyxrDpO6Ez | 3 3906083473050778213 | 8sEvAOzV4sdqguWpAzkkEwGyxrDpO6Ez | 3 4121135822717137972 | 8sEvAOzV4sdqguWpAzkkEwGyxrDpO6Ez | 3 7953186949000202802 | 8sEvAOzV4sdqguWpAzkkEwGyxrDpO6Ez | 3 8098984466286210401 | 8sEvAOzV4sdqguWpAzkkEwGyxrDpO6Ez | 3 7953205644807332452 | 8sEvAOzV4sdqguWpAzkkEwGyxrDpO6Ez | 3 (21 rows)Could you tell me how to solve this problem?
Hi @liquidbear ,
This is a wild guess, can you try your query this way?
Sorry I don't have a M1 macbook to test by myself.
@VValdar
Thank you for your advice.
However, we confirmed that the same problem occurred.
Result
epdb_dev-> FROM member m epdb_dev-> INNER JOIN board b /*+ EARLY_MATERIALIZATION */ epdb_dev-> ON m.member_id=b.member_id; member_id | board_id | member_id -----------+----------------------------------+----------- 1 | W5pJt6mH9z5q2HSlcm1N9W4y43jzgoX8 | 1 2 | nWc2nb4r2JOukPs4aIwtpmIaH7Q6rpr5 | 2 3 | 8sEvAOzV4sdqguWpAzkkEwGyxrDpO6Ez | 3 (3 rows) dbadmin@0dd25399f0a7(*)=> SELECT * epdb_dev-> FROM member m epdb_dev-> INNER JOIN board b /*+ EARLY_MATERIALIZATION */ epdb_dev-> ON m.member_id=b.member_id; member_id | board_id | member_id -----------+----------------------------------+----------- 1 | W5pJt6mH9z5q2HSlcm1N9W4y43jzgoX8 | 1 2 | W5pJt6mH9z5q2HSlcm1N9W4y43jzgoX8 | 1 2 | nWc2nb4r2JOukPs4aIwtpmIaH7Q6rpr5 | 2 3 | nWc2nb4r2JOukPs4aIwtpmIaH7Q6rpr5 | 2 3 | 8sEvAOzV4sdqguWpAzkkEwGyxrDpO6Ez | 3 0 | 8sEvAOzV4sdqguWpAzkkEwGyxrDpO6Ez | 3 (6 rows)On v12.0.2-0 I've got the error in the 2nd query run, only when some of the projections from previous run were not dropped:
ERROR 3149: Duplicate primary/unique key detected in join [(public.board_test x public.member_test) using board_test_super and member_test_super (PATH ID: 1)]; value [member_id=1]
Please add CASCADE to all "drop" statements.
And before running the test again please verify that all your tables and all projections are dropped.