How can I estimate the storage cost of a single row?

I've been asked to provide some elementary projections on how our database will grow as the number of our users grows.

Say I know roughly how many rows in a table will be generated per user per day.

How can I translate that into an estimate of how many MB of storage will be consumed per user per day?

Comments

  • Hi Nicholas,

    A rough rule of thumb is, take your data and write it to an uncompressed .csv file; Vertica might use somewhere between 10% and 50% as much disk space as the size of the file to store your data.  Maybe.  Depending on your data.

    Generally, you can't estimate the storage cost of a single row because we don't store it as a single row.  We're a compressed column-store database; disk utilization depends heavily on how compressible your data is.  If you have a trillion identical rows, we may only need a few tens of bytes to store all of them.  If you have random raw binary data (for example, encrypted blocks from a good crypto algorithm), we can't pick that data apart so we can't compress it.  If you have a mix of data within the same table, some columns with consistent or repeating values and some with random uncompressible stuff, we'll compress what we can.

    If you take a representative sample of your data (preferably not too small -- I'd do order of millions of rows if you can), load it, and run the DBD to optimize its storage and design appropriate projections, that should give you a pretty good rough idea of how much disk space we need to store that many rows.  Assuming your data's compressibility stays the same, we should scale roughly linearly from there.

    Adam
  • OK, that makes sense. And the rule of thumb you provided serves my purposes. Thank you.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file