Rule of thumb for picking shard count?

kxukxu Administrator

How should you pick the number of shards? Is there a rule of thumb based on data size?

Sizing and Configuring Vertica in Eon Mode for Different Use Cases
@skeswani @skamat

Answers

  • skeswaniskeswani - Select Field - Employee

    Choice of shards is not based on total database size, rather its based on the active working set data size.
    since the amount of data you can put in S3 is unbounded the choice of shard count should not be based on total data size.
    for example, if you store logs from machines, but only ever query/process logs for the last year. then your working set is 1 years worth of data. irrespective of the total data size which could be 10 years or more, your choices and sizing should be based on the 1 years worth of data.

    usually 12 shards is a good number. for medium to large datasets a number between 12 and 24 which as a lot of factors is a good number to pick

  • ChuckBChuckB Vertica Employee Employee

    In addition to what @skeswani said about using enough machines to hold the depot you need, you might pick shard count based on the ETL workload. You should have at least as many shards as their are machines in the ETL cluster, perhaps times a small factor for expansion. It may also be possible that you need a higher shard count for the query workload if you have queries on lots of data, but most of the time the query capacity is better achieved with multiple machines serving the same shard.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file