Need suggestion to set up a production cluster for Vertica Eon on MinIO
We’re planning to set up production clusters using Vertica Eon on MinIO. After reading Vertica Document about Number of Shards which has the recommended shard count and initial node count base on the working data size as the below table.
And currently we have 260TB Vertica license which is bigger than “Extra large” Cluster Type (Up to 192TB) in Vertica document a bit.
and we have new 80 servers plus 26 older servers. And we need to use some of them for MinIO cluster also. And at least we should have 3 sub-clusters (Ingest + Query + Query).
So, could you give us a suggestion about number of shard and number of Vertica node?
We also found that in Vertica 10.x.x, Vertica already implemented elastic crunch scaling (ECS) for query performance scaling.
New 80 servers:
Model: ProLiant DL360 Gen10
CPU: Intel xeon gold 6240 @ 2.60ghz
RAM: 768GB
Disk: 7680GB SSD x 2 (RAID 1)
Network: 50Gpbs (25Gbps + 25Gbps Aggregated)
Old 8 servers:
Model: ProLiant DL360 Gen9
CPU: Intel xeon 2690 v3 @ 2.60ghz
RAM: 512GB
Disk: 900GB HDD x 2 (RAID 1) for OS, 1920GB SSD x 5 (RAID 5) for Data
Network: 10Gpbs (Active/Standby) [We’re checking with network team, is it possible to Aggregated)
Old 18 servers:
Model: ProLiant DL360 Gen10
CPU: Intel xeon gold 6150 (13 Servers)/ 6126 (5 Servers) @ 2.70ghz
RAM: 512GB
Disk: 7680GB SSD x 2 (RAID 1)
Network: 10Gpbs (Active/Standby) [We’re checking with network team, is it possible to Aggregated)
Refer: https://www.vertica.com/docs/10.0.x/HTML/Content/Authoring/Eon/SizingEonCluster.htm
https://www.vertica.com/docs/10.0.x/HTML/Content/Authoring/Eon/Elasticity.htm
Best Answer
-
mosheg Vertica Employee Administrator
It is advised to consult with a MinIO expert and your Vertica technical manger or Vertica PS or open a support case.
In high level consider the following options example.Option A - (Low concurrency, data compression ratio equal or greater than 2, with no DR)
This option assumes enough disk space for Communal Storage using only the old servers for MinIO.
MinIO cluster (old nodes): 8 x 1,920 GB = 15.4 TB
MinIO cluster (old nodes): 18 x 7,680 GB = 138.2 TB
Total for Communal Storage: ~153.6 TB
DB Shards: 40
1 x Primary Ingest cluster, for DDL statements and data loading: 40 new nodes
1 x Subcluster for running queries: 40 new nodesOption B - (High concurrency, data compression ratio equal or greater than 2, with no DR)
The same MinIO configuration as in option A.
DB Shards: 20
1 x Primary Ingest cluster for DDL statements and data loading: 20 new nodes
3 x Subclusters for running queries, each with 20 new nodesOption C - (Data compression ratio equal or lower than 2, with no DR)
This option assumes, the old servers will not be enough to satisfy the required Communal Storage size for MinIO.
MinIO cluster (old nodes): 8 x 1,920GB = 15.4 TB
MinIO cluster (old nodes): 18 x 7,680GB = 138.2 TB
MinIO cluster (new nodes): 20 x 7,680GB = 153.6 TB
Total for Communal Storage: ~307.2 TB
DB Shards: 20
1 x Primary Ingest Cluster for DDL statements and data loading: 20 new nodes
2 x Subclusters for running queries, each with 20 new nodesOption D - (Data compression ratio equal or greater than 2, with DR)
DB Shards: 20
MinIO cluster (old nodes): 8 x 1,920GB = 15.4 TB
MinIO cluster (old nodes): 18 x 7,680GB = 138.2TB
Total TB for Main Cluster Communal Storage: ~153.6 TBMinIO cluster (new nodes): 20 x 7,680GB = 153.6TB
Total TB for DR Cluster Communal Storage: ~153.6 TB1 x Primary Main cluster for DDL statements and data loading: 20 new nodes
1 x Subcluster for running queries: 20 new nodes
1 x Separated Primary DR cluster for DDL statements, loading and queries: 20 new nodes5
Answers
Thanks @mosheg. I've asked Vertica technical experts (@marcothesane & Maurizio). It seems like they quite busy now. Haha. And my Vertica Presale consultant suggested me to post in Vertica Forum to get different opinions from other experts here. Anyway we'll try to raise a Vertica support case also.
Any other suggestions from everyone are welcome.
The above examples were suggested to address different scenarios.
Before we try to detail the right sizing, consider the following questions.
1. What will be the real physical database size on disk, 3 or 5 years ahead?
To answer this, one should ask:
What is the current database size you start with?
How much data will be added daily?
How long do you need to keep the data? What is the data retention policy?
Do you need to keep historical data? How much? For how long?
What is your database compression ratio?
2. Do you have different SLA requirements (e.g. slower historic data queries)?
Can you benefit from different performance based on different data, storage tiers or clusters?
Does your long running queries on 10 node cluster run much faster on a 20 nodes cluster?
3. How much disk space is needed for ETL activities?
What is the total files size planned to be loaded daily?
How long do you need to keep those files after the load?
4. How many concurrent activities are planned at peak hour (queries, load, updates, backup, etc.)?
5. Do you need a DR cluster?
To answer this, one should ask:
Based on your organization backup policy, you miss some data changes between backups.
In worst scenario, how much data will be lost after a restore?
In case of a disaster (power outage, human error etc.), how long will it take to recover from backup?
How long the organization is willing to wait until the system will be available again?
One year ahead, what will be the business impact for each down time hour?
Can you configure a dual load for active/active DR cluster?
Can you use external tables to query files (slower) when database is not available?
6. Which mode will be more beneficial for your database? EON or EE mode?
Eon mode has amazing benefits, but sometimes EE might be more valuable to your specific position.
To answer this, one should ask:
Do you plan to work on the cloud? how much?
Does your on prem h/w already include separated compute servers and separated storage?
If you plan to use EON/MinIO, does it make sense to configure powerful servers to serve as S3 storage, just because they have direct attached storage? although most of their compute power will not be used?
EON is preferred when your S3 include durability. (e.g. no need to set up RAID, copies, backups, etc.)
Is it your case?
In EON, node recovery is faster, with no need for a redundant node, and nodes can be easily added or removed. However, if you have K-SAFE 1 and enough redundant nodes or DR active/active cluster, node recovery time is less relevant.
Eon mode flexibility to un-provision the database by shutting down all the compute nodes in the cluster, is something you need (on prem)?
Maybe EE cluster will utilize your given h/w more efficiently?
@mosheg Thank you for helping with a lot of questions.
Let me tell you our background a bit. We use Vertica EE for a long time (5 years plus). We keep expand our cluster over the time.
At the begining we had 8 Nodes x 3 clusters (Total 24 Nodes on 2 DC [2/1]) with 31TB license. For ETL, we use SSIS (Extract data from MSSQL). And use copycluster (vbr.py) to copy to 2nd & 3rd cluster
And then we expanded out clusters to be 16 Nodes (Total 48 Nodes on 2 DC[2/1]) with 51TB license. We still continue with the same method for ETL and Copy data between clusters.
For the first 2 generation above, we build data warehouse on Vertica.
After that Hadoop came in to the game. So our roadmap was changed to build data warehouse on Hadoop and then sync a part of them to Vertica with our in-house ETL tool & in-house Sync tool. Our source still on MSSQL. the ETL tool is used for extract data from MSSQL and build data warehouse on Hadoop. Sync tool is used for sync data from Hadoop to every where we want with an opion to filter time windows of data.
And then we added more capacity to Vertica cluster also because we have more than 1K tables and a new huge data table which to analyze behaviour of user on our platform.
So currently, we have 26 nodes (25 active +1 standby) x 4 clusters on 2 DC[3/1] with 260TB license. Now the size is too big to copy cluster over DC (Different Country). For 2 main clusters in 1st DC, we still use copy cluster every morning, parallel with Sync tools to load to both cluster but not all tables yet.
And now we plan to move to our main DC to a different DC that why we bought a new set of servers. At the time we bought them, Vertica Eon on Premise didn't release yet. So we planned to have the same number of nodes and clusters. 26 x 3 = 78 then + 2 (Spare) = 80. During that we have tested Vertica Eon on AWS compare to another database as a service as well such as BigQuery, Snowflake, etc..
Finally, Vertica Eon on Premise was released, Now we have tested (POC) Vertica Eon on Premise on HDFS but it is to specific version for HDFS, That why we're going with MinIO.
Don't let me tell how much we suffer with EE. Hahaha.
Thank you oBoMBaYo,
To be more specific lets continue the discussion via email or open a support case.
Thanks Mosheg on emphasising to involve MinIO for their experties.
FYI. We've opened a suport case SD02776243.