Big Tables and Partitioning
5 Machines Cluster
16 GB of RAM and 4 CPU’s on each node.
Storage – regular attached NFS disks. (with enough disk apace)
I have created two big FACT & DIM tables for Multi Tenants usage...
those tables are generic tables for all Tenants Usage.
Two main fields distinct the Tenants : TenentId & EntityId.
Also I've created a simulator that fill-up this tables (of course we have metadata layer as well) the logical schema is sales + and it dimension Products+Stores+Regions….etc
The goals of this POC are:
· MOST Queries response time should be around 1 second
· Find out, the best segmentation and partitioning design for the BIG Table design?
· Find out, how many tenants and rows per tenant can hold one set of FACT+DIM?
· How to decide when to open a new set of FACT+DIM?
· How large amount of long string effect results, and how to solve it?
The FACT table definition include 100 Dates , 100 Numeric(35,6) and 100 Varchar(4096) , and ID (AUTO_INCREMENT) as Primary Key
segmentation by hash(TenantId)
partition by (TenantId||EntityId)
I have 2 main projections for the FACT table (super projection that has all the columns, and small one that include only the relevant columns according to the queries)
Now, starting with 5 Tenants and only one logical table and 1,000,000 per Tenant the results are inconclusive and I have some questions:
1. Is it recommend to have same fields for Segmentation and Partitioning ?
2. Partitioning , partitioning upon the super projection has 1 partition per tenant which is fine.
3. The explain plane do not show the partitioning usage, even if I use TenantId||EntityId in the WHERE Clause , WHY ?
4. However, for the other small projection, it created 1 or 2 partitions per node. – is this normal? I would expect 1 partition per Tenant||Entity?
5. There are 3 aggregated SQL’s , and the response time are 1.5 , 0.5 , 0.39 (second)
6. Adding additional 10Millions rows per Tenant to the FACT increased the number of partitions to 240 and response time to above 2 minutes…WHY?
HP Big Data, DBA
Phone: +972 3 5397019
Cell : +972 54 5597107