How should you pick the number of shards? Is there a rule of thumb based on data size?
Sizing and Configuring Vertica in Eon Mode for Different Use Cases@skeswani @skamat
Choice of shards is not based on total database size, rather its based on the active working set data size.
since the amount of data you can put in S3 is unbounded the choice of shard count should not be based on total data size.
for example, if you store logs from machines, but only ever query/process logs for the last year. then your working set is 1 years worth of data. irrespective of the total data size which could be 10 years or more, your choices and sizing should be based on the 1 years worth of data.
usually 12 shards is a good number. for medium to large datasets a number between 12 and 24 which as a lot of factors is a good number to pick
In addition to what @skeswani said about using enough machines to hold the depot you need, you might pick shard count based on the ETL workload. You should have at least as many shards as their are machines in the ETL cluster, perhaps times a small factor for expansion. It may also be possible that you need a higher shard count for the query workload if you have queries on lots of data, but most of the time the query capacity is better achieved with multiple machines serving the same shard.
Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.