After JOIN statements run result something wrong when using Docker on M1 MacBook
- OS : macOS Monterey (12.3)
- CPU : M1 Pro
- RAM : 32GB
- Docker Image & version : vertica/vertica-ce:12.0.3-0
I am currently running a "verticaDB" using docker image on colima.
Thanks to the link below, the operation was successful.
https://forum.vertica.com/discussion/242629/support-apple-silicon-m1-for-docker-vertica-ce
However, when using JOIN statements, you can see that the result value is strange.
Below is the result of the first query.
SELECT m.id, b.id FROM schema.member m INNER JOIN schema.board b
mall_id | mall_id ---------+--------- 10 | 10 20 | 20 30 | 30 40 | 40 50 | 50 60 | 60 70 | 70 80 | 80 90 | 90 (9 rows)
This result is a valid result.
However, if run the query again, you can see that the garbage value is generated.
mall_id | mall_id ---------+--------- 10 | 10 10 | 20 <--- garbage value 20 | 20 20 | 30 <--- garbage value 30 | 30 30 | 40 <--- garbage value 40 | 40 40 | 50 <--- garbage value 50 | 50 50 | 60 <--- garbage value 60 | 60 60 | 70 <--- garbage value 70 | 70 70 | 80 <--- garbage value 80 | 80 80 | 90 <--- garbage value 90 | 90 90 | 1833761 <--- garbage value (18 rows)
You can see that strange values are included as shown above, and if do repeated, it will be treated as follows.
mall_id | mall_id ---------+--------------------- 10 | 10 10 | 20 10 | 30 20 | 20 20 | 30 20 | 40 30 | 30 30 | 40 30 | 50 40 | 40 40 | 50 40 | 60 50 | 50 50 | 60 50 | 70 60 | 60 60 | 70 60 | 80 70 | 70 70 | 80 70 | 90 80 | 80 80 | 90 80 | 8028074750225091121 90 | 90 90 | 8028074750225091121 90 | 7304687113783041902 (27 rows)
However, if you look at the results of the execution of Query execution explain, it seems normal.
EXPLAIN SELECT m.id, b.id FROM schema.member m INNER JOIN schema.board b Access Path: +-JOIN HASH [Cost: 291, Rows: 9] (PATH ID: 1) | Join Cond: (m.id = b.id) | +-- Outer -> STORAGE ACCESS for m [Cost: 145, Rows: 9] (PATH ID: 2) | | Projection: schema.member_super | | Materialize: m.id | | Runtime Filter: (SIP1(HashJoin): m.id) | +-- Inner -> STORAGE ACCESS for b [Cost: 145, Rows: 9] (PATH ID: 3) | | Projection: schema.board_super | | Materialize: b.id
To deal with this part, i rebuilt the docker image by referring to the official docker image repository below.
https://github.com/vertica/vertica-containers
Both the rpm package and the deb package were rebuilt with amd64 arch, but it was confirmed that the following error was spitting out.
vsql: could not connect to server: Cannot assign requested address Is the server running on host "???" and accepting TCP/IP connections on port 5433?
At this time, colima was set to use qemu-system-x86_64, and it was confirmed that the architecture was used normally in docker info
or colima status.
❯ colima status -p amd64 INFO[0000] colima [profile=amd64] is running using QEMU INFO[0000] arch: x86_64 INFO[0000] runtime: docker INFO[0000] mountType: sshfs INFO[0000] address: 192.168.106.3 INFO[0000] socket: unix:///Users/me/.colima/amd64/docker.sock ❯ docker info Client: Context: colima-amd64 Debug Mode: false Plugins: buildx: Docker Buildx (Docker Inc., v0.10.2) ... Kernel Version: 5.15.82-0-virt Operating System: Alpine Linux v3.16 OSType: linux Architecture: x86_64
I couldn't solve this problem because of the above error, and i'm using the official image by writing the docker-compose
file as below.
vertica: image: vertica/vertica-ce:12.0.3-0 container_name: vertica platform: linux/amd64 environment: VERTICA_DB_USER: "dbadmin" VERTICA_DB_PASSWORD: "" VERTICA_DB_NAME: "dev" VERTICA_MEMDEBUG: 2 TZ: "Asia/Seoul" volumes: - ./vertica/init-scripts:/docker-entrypoint-initdb.d ports: - "5433:5433" - "5444:5444"
If there are people like me who have experienced the problem of garbage value when using JOIN statement, I wonder how you solved it.
Thanks for in advance.
Answers
Please share the join statement and create tables DDL repo.
SELECT m.id, b.id FROM schema.member m INNER JOIN schema.board b
Produces Syntax error.
Please consider something like the following example:
-
@mosheg
First of all, thank you for the detailed example.
However, the DDL you wrote above is working normally, but my DDL is not working properly yet.
I am attaching a sample DDL to you.
The following is the result of performing the sample DDL as shown above.
First Run Query
Second Run Query
Query Explain
In this case, the above error appears, and the case where the actual garbage value is generated can be reproduced by using the DDL as below.
Result
Could you tell me how to solve this problem?
Hi @liquidbear ,
This is a wild guess, can you try your query this way?
Sorry I don't have a M1 macbook to test by myself.
@VValdar
Thank you for your advice.
However, we confirmed that the same problem occurred.
Result
On v12.0.2-0 I've got the error in the 2nd query run, only when some of the projections from previous run were not dropped:
ERROR 3149: Duplicate primary/unique key detected in join [(public.board_test x public.member_test) using board_test_super and member_test_super (PATH ID: 1)]; value [member_id=1]
Please add CASCADE to all "drop" statements.
And before running the test again please verify that all your tables and all projections are dropped.