Joins on multiple(more than two) tables - extremely slow
We've designed tables like following - we have a primary table + multiple tables for each attribute that can have multiple attributes(array values)
primary table: f_network_events(flow_id, comp_id, event_time, ip_src, ip_dst, port_src, port_dst, application,.......)
then a few individual tables like
file_name(flow_id, comp_id, file_name)
file_type(flow_id, comp_id, file_type)
server_name(flow_id, comp_id, server_name)
I have a basic understanding of JOINs - I know to avail merge join for two tables I have required projections ready i.e. order by flow_id, comp_id. The join on two tables works properly e.g. the JOIN on f_network_events and file_name
select * from f_network_events JOIN file_name on f_network_events.flow_id = file_name.flow_id and f_network_events.comp_id = file_name.comp_id;
since I have projections (with the order by flow_id, comp_id) on both tables, the query optimizer is picking correct projections as depicted in the EXPLAIN command.
However, If I'm trying to JOIN more than two tables the query optimizer is not picking the correct projections (its picking super projections) and the query is extremely slow.
select * from f_network_events JOIN file_name on f_network_events.flow_id = file_name.flow_id and f_network_events.comp_id = file_name.comp_id JOIN file_type on f_network_events.flow_id = file_type.flow_id and f_network_events.comp_id = file_type.comp_id ;
So, in brief, I have two issues
1. Join on more than two tables is extremely slow and why the correct projection is not getting picked up by the query optimizer.
2. Also, the JOIN on two tables is slow when I add order by event_timestamp in the query
select * from f_network_events JOIN file_name on f_network_events.flow_id = file_name.flow_id and f_network_events.comp_id = file_name.comp_id order by f_network_events .event_timestamp desc limit 100;
I tried giving query to the database designer and deployed suggested projections but still, there is no improvement in the query execution - they are extremely slow.
Note: tables have around ~ 300 million rows.