Optimizing DISTANCE using projections
zun
Community Edition User
Hi folks,
I am currently using the DISTANCE function in vertica. I'm not getting great performance, so wanted to get your opinion on if this can be optimized. If anyone has optimized DISTANCE using projections before, your opinion would be greatly appreciated.
My Query:
SELECT Date_trunc('month', modified_date), APPROXIMATE_COUNT_DISTINCT(userId) AS distinct_user_count FROM mySchema.userProjection WHERE modified_date BETWEEN '2021-01-01' AND '2022-01-31' AND country = 'US' AND DISTANCE(40.71427, -74.00597, userLatitude, userLongitude) < 160 GROUP BY 1 ORDER BY 1
My Projection:
CREATE PROJECTION mySchema.userProjection ( country ENCODING RLE, modified_date ENCODING ZSTD_FAST_COMP, userLatitude ENCODING ZSTD_FAST_COMP, userLongitude ENCODING ZSTD_FAST_COMP, userId ENCODING ZSTD_FAST_COMP ) AS SELECT myTable.country, myTable.modified_date, myTable.userLatitude, myTable.userLongitude, myTable.userId, FROM mySchema.myTable ORDER BY myTable.country, myTable.userLatitude, myTable.userLongitude SEGMENTED BY hash(myTable.country, myTable.userId, myTable.modified_date) ALL NODES;
The table is partitioned by modified_date. Query currently takes about 16s, I was really aiming for <5s.
My guess is DISTANCE
might not be using the projection correctly.
Appreciate any help!
Tagged:
0
Answers
Can you check which operator took time? Does myTable have modified_date column?
@Hibiki How can I figure out which operator took more time? Yes, the table has a modified_date column. The table is partitioned on modified_date (specifically partitioned on Date_trunc('month', modified_date))
Please collect the query profile by using PROFILE clause.
You can see the transaction ID and statement ID as below.
Then, please execute the following query to collect the time for each operator.
You can get the query plan as well by the following query.
Please replace the transaction ID and statement ID by yours.