query performance with multiple where clauses

edited June 2024 in General Discussion

hi,

I have a table with about 24 columns and >100Billion rows in the table

table is partitioned by a date only column.

I have a query which has to be executed on this table a 10k times a day with different params and needs to return about 8 columns

for the above scenario, i have created a projection which contains only the required columns and in the order by clause all the columns from the where clause are added

projection looks like

create projection test (dt , accountid, ts, ts_sort,tsday_sort,username, categoryid,someid, *,*,*,*,) as
select
        dt        ,
        accountid ,
        ts        ,
        ts_sort   ,
        tsday_sort,
        username  ,
        categoryid
from
        main_table
order by
        tsday_sort,
        ts_sort   ,
        accountid ,
        someid    ,
        blockedreason) segmented by (ts,accountid,dt) all nodes;

and when I query the table like below

select ts,accountid,*,*,*,*
from main_table
where tsday_sort between 100 and 200
and ts_sort between 2000000 and 2500000 
order by tsday_sort,ts_sort
limit 100

it runs faster 1-2s

but when I add one more predicate categoryid like

select ts,accountid,*,*,*,*
from main_table
where tsday_sort between 100 and 200
and ts_sort between 2000000 and 2500000 
and categoryid between 1 and 3
order by tsday_sort,ts_sort
limit 100

it runs for 15s+ every time.

Why is order by clause not helping in the performance here?

Answers

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file