Understand disk storage output

I recently looked at the disk_storage table, and I do not understand the results. The query result shows that I am using about 445 000 MB per node, so about 440GB But if I do a df for the db volume on each node, I see not more than 15 654 552 1K blocks, in other words roughly 15GB. At the same time, the audit function says that I have 120GB for my schema, and the license compliance check says that I have 90 GB. Can someone reconcile all these ? Thanks

Comments

  • The disk_storage reports the df output in MB, it does not report the actual data files / catalog usage. For data files you could use projection_storage aggregations over used_bytes @ node_name level or the du Linux command. As such, disk_storage report should line up with df command . Rather than looking at 1k- blocks I would look at Used / Available and % used and make sure you look at the right volume. 15GB may not be the vertica data volume. Check the storage path in disk_storage. As far as audit / license compliance goes, the audit function may not reflect itself in the license compliance check until about 24 h. Regardless, the audit / license is an estimate , so don't take that too precise ( may be 5-10% off)
  • df and disk_storage are definitely not in sync regarding the disk space in use, but strictly agree on the free space, about 8200GB per node. The thing is that if I add used and free space from df, I do not get the total, which makes me think that df output is wrong. But if I do a du in /vertica, it gives the same result as df, about 15GB on each node. I don't know what to think. All 4 nodes report the same values: df Filesystem 1K-blocks Used Available Use% Mounted on . . /dev/mapper/VolGroup01-db_vol 8652481412 15654552 8197305992 1% /vertica /dev/mapper/VolGroup01-dbcat_vol 432611152 23069980 387565724 6% /catalog
  • Coming back to the raw data loaded, what troubles me is that I loaded 36GB of raw data , using COPY command of text files, but vertica is reporting 90, or even 118GB loaded. Could you explain what could be causing this ? Thanks
  • What format was your data in, when you loaded it? Vertica's audit size (and, relatedly, its license-compliance check) is NOT a measure of actual disk usage of the data as stored in Vertica. We don't charge you for creating additional projections (which would use more disk space), nor for any inefficiencies in our data-compression algorithms. On the flip side, if you compress (gzip, etc) your data before loading it, or use our UDL API to load an unusual custom format, etc., that doesn't give you a discount :-) The audit mechanism works as documented here: https://my.vertica.com/docs/6.1.x/HTML/index.htm#15447.htm (It can be closely approximated as "size of the data if it were to be dumped uncompressed into a comma-delimited text file.") As Colin notes, the license-compliance check can lag by up to a day. Both are also statistical estimates (since we can't actually try dumping your data to a delimited file, that would take far too long) and can vary slightly; more for the audit depending on the precision parameters that you give it. For more on all of this, you can refer to the various audit and licensing sections of our documentation. For the disk_usage table, what query are you running? Just a "SELECT *" or something with aggregates/etc? If you really are seeing differences between the disk_usage table and the output of "df", I would be suspicious that either your Linux mount points are not what you think they are, or that you have filesystem corruption / an invalid free-block count somewhere. In the latter case, you should run an "fsck" at your earliest convenience.
  • The raw data files are either plain text files, or csv files, all uncompressed. But I realize that the set of queries I am running is also loading additional tables where I store results of aggregations, ..., I guess this explains the difference ? Regarding the difference between df and disk_storage, in fact there is none, as I said earlier, df reports the actual used space by data files, in line with du, as well as projection_storage. disk_storage is also counting as used about 5% of the total disk space, which is in fact used by the OS. You can check that in df output in my earlier post, if you add free and used, the result is only 95% of total But at the end, they all agree on the free space.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file