Optimized Projection

faridahmad · June 2019

I have 2 below queries. I want to improve the performance of these queries by designing optimized projection or any other way to achieve the same. Can you help me out here.

Query1:

SELECT "TABLE2”.”coulmn1” AS “c1”
FROM "SCHEMA"."TABLE1" "TABLE1"
  LEFT JOIN "SCHEMA"."TABLE2" "TABLE2" ON ("TABLE1"."id" = "TABLE2"."id")
WHERE (((CASE WHEN (CAST( (12 * DATE_PART('YEAR', CURRENT_DATE) + DATE_PART('MONTH', CURRENT_DATE)) - (12 * DATE_PART('YEAR', "TABLE1"."transaction_date") + DATE_PART('MONTH', "TABLE1"."transaction_date")) AS BIGINT) <= 1) THEN 1 ELSE 0 END) = 1) AND ("TABLE1"."t_id" IN (SELECT t_id FROM SCHEMA.TABLE3 WHERE c_id in (select c_id FROM SCHEMA.TABLE3))))
GROUP BY 1
ORDER BY 1 ASC;

Query2:

SELECT "TABLE1"."account_type_display" AS "account_type_display"
FROM "SCHEMA"."TABLE1" "TABLE1"
WHERE (((CASE WHEN (CAST( (12 * DATE_PART('YEAR', CURRENT_DATE) + DATE_PART('MONTH', CURRENT_DATE)) - (12 * DATE_PART('YEAR', "TABLE1"."transaction_date") + DATE_PART('MONTH', "TABLE1"."transaction_date")) AS BIGINT) <= 1) THEN 1 ELSE 0 END) = 1) AND ("TABLE1"."t_id" IN (SELECT t_id FROM SCHEMA.TABLE3 WHERE c_id in (select c_id FROM SCHEMA.TABLE3))))
GROUP BY 1
ORDER BY 1 ASC;

I have 20 more queries which are very similar to these 2 queries.

Bryan_H · June 2019

Have you run an explain plan and/or profile on any of the queries? This will help identify any bottlenecks. In particular you'll want to analyze statistics if the explain plan shows NO STATISTICS on any query, and create a new projection with correct grouping, ordering, segmentation to match the plan.

The quickest way is to copy these two queries - or possibly all queries - to a file and run this through DataBase Designer (DBD) from admintools or Management Console. You can run an incremental design only on the schema(s) and table(s) in the queries using the query file, then analyze statistics and build the design. In case of incremental design, I usually don't deploy automatically, instead I download and review the proposed projections to verify they seem reasonable.

faridahmad · June 2019

@Bryan_H : Yes I ran explain plan and found both query were showing NO STATISTICS, so I analyze statistics. I used DBD to create projection by using comprehensive design. But the projection suggested by DBD containing 20 columns out of 30 columns in segmentation which doesn't seem reasonable.

skeswani · June 2019

can you provide the create table statements for the tables?

faridahmad · June 2019

Here is the create table statement:

CREATE TABLE SCHEMA.TABLE1
(
id(200) NOT NULL,
transaction_id varchar(200) NOT NULL,
transaction_date date,
coulmn3 varchar(200),
coulmn4 varchar(200),
coulmn5 varchar(200)
)

remaining tables also have same create statement.

Vertica_Curtis · June 2019

It kind of depends. For these tables, definitely make sure they are all segmented by "id". DBD will often pick many columns, because it wants to ensure uniqueness. If "id" is not already the Primary Key of the table, then make it so. That will help the optimizer. DBD might not be picking just ID because it's not the PK.

I don't know what you're trying to do in that "case" statement, but I can't help but think that datediff() might be able to do that easier. Check out that function. That will likely improve your performance here.

Lastly, in terms of projection design, you've got two options that I can see. You can either optimize for the GROUP BY, or you can optimize for the join. If you optimize for the join, then order everything by "id". That's going to get you merge joins. If the group by operation is costly, you could optimize by that. So, whichever columns are in your group by, make that your order by clause. You could also include the "id" in the order by as the last column. So, using query two as an example, your projection order by would be "account_type_display, id".

Vertica_Curtis · June 2019

Also, make id an int if it's not already.

faridahmad · June 2019

@Vertica_Curtis : Thanks for insightful information.Earlier Hash join was being used but after your suggestion I kept join column in projection's order by, so now merge join being used. Now my biggest concern is group by operation, as I am executing a left join query and performing group by on 33 columns which causing a GroupBy Spill.I checked the explain plan and seeing GROUP BY HASH is being used. so, to use GROUP BY PIPE I kept all group by columns in TABLE1 and TABLE2 projection's order by clause. Now GROUP BY PIPE is being used but only when I am selecting TABLE1 columns, if trying to select TABLE2 columns, again GROUP BY HASH being used.Since out of 33 group by columns 11 are from TABLE2. So I can't miss those. How can I select these 33 columns with GROUP BY PIPE algorithm being used?

Vertica_Curtis · June 2019

It's not always possible to solve a join issue AND a group by issue, as they often have competing goals. Group by pipelined is not possible if the columns from the group by span across two different tables. To speed that up, your best bet is to put them all into a single flattened table, and then you can create a projection on that ordered in the appropriate way - to get you a pipelined group by. That might be your best option.

We're Moving!

Create My New Community Account Now

Optimized Projection

Comments

Leave a Comment