How can I estimate the storage cost of a single row?

I've been asked to provide some elementary projections on how our database will grow as the number of our users grows.

Say I know roughly how many rows in a table will be generated per user per day.

How can I translate that into an estimate of how many MB of storage will be consumed per user per day?


  • Options
    Hi Nicholas,

    A rough rule of thumb is, take your data and write it to an uncompressed .csv file; Vertica might use somewhere between 10% and 50% as much disk space as the size of the file to store your data.  Maybe.  Depending on your data.

    Generally, you can't estimate the storage cost of a single row because we don't store it as a single row.  We're a compressed column-store database; disk utilization depends heavily on how compressible your data is.  If you have a trillion identical rows, we may only need a few tens of bytes to store all of them.  If you have random raw binary data (for example, encrypted blocks from a good crypto algorithm), we can't pick that data apart so we can't compress it.  If you have a mix of data within the same table, some columns with consistent or repeating values and some with random uncompressible stuff, we'll compress what we can.

    If you take a representative sample of your data (preferably not too small -- I'd do order of millions of rows if you can), load it, and run the DBD to optimize its storage and design appropriate projections, that should give you a pretty good rough idea of how much disk space we need to store that many rows.  Assuming your data's compressibility stays the same, we should scale roughly linearly from there.

  • Options
    OK, that makes sense. And the rule of thumb you provided serves my purposes. Thank you.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file