Vertica Disk Space

I see disk_space_free_mb information two tables host_resources and disk_storage. In my environment, these tables show different values; while host_resources table shows free space at 45%, disk_storage shows at 16%.
Why such disconnect and which table is actually more relevant to understand that we are running out of disk space?

Comments

  • Jim_KnicelyJim_Knicely Administrator
    edited July 2019

    In the HOST_RESOURCES table:
    DISK_SPACE_FREE_MB = The free disk space available, in megabytes, for all storage location file systems (data directories).

    In the DISK_STORAGE table:
    DISK_SPACE_FREE_MB = The number of megabytes of free storage available.

    They aren't measuring the same thing.

    See:
    https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/SystemTables/MONITOR/HOST_RESOURCES.htm
    https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/SystemTables/MONITOR/DISK_STORAGE.htm

  • Jim, thanks for the tip.
    I found another table; storage usage, that shows the information provided in both tables mentioned by me:

    • host_resources: vertica filesystem, 45% free space, no device listed
    • disk_storage: ext4 filesystem; 16% free space, /dev/sda2 device listed

    For Vertica performance, should I be concerned with metrics under vertica filesystem or ext4 filesystem?

    Storage_Usage table also shows two more filesystems:
    1. ext3 filesystem with /dev/sda1 device having /boot path
    2. devtmpfs with devtmpfs device having /dev path

    These seem to be system related.

  • Bryan_HBryan_H Vertica Employee Administrator

    Hi, your outputs seem slightly different than mine on version 9.2. What version are you running? Each table lists a slightly different view of storage:
    DISK_STORAGE: the properties of the disks (mount points) on which Vertica storage locations (data, catalog, temp folders) reside. These are details of the disks that hold paths listed in STORAGE_LOCATIONS as well as node-local catalog locations.
    HOST_RESOURCES: This is a snapshot of several system parameters, including total/free memory and total/free disk space for storage locations listed in STORAGE_LOCATIONS. It may not list node-local catalog directories if these are not defined in STORAGE_LOCATIONS and/or are mounted on different volumes. This is the sum of all storage locations, so may differ from the detail in DISK_STORAGE and STORAGE_USAGE, which itemize storage.
    STORAGE_USAGE: Lists all local mount points, similar to df -h, so will include additional mounts like root and boot.

    However, if you are running into disk space issues, then the Data Collector tables offer a better view of storage itemized by disk. This view show disk usage trends by day, so maybe it is closer to what you need:
    select * from v_internal.dc_storage_info_by_day;
    You will probably want to add a WHERE clause to filter to the disk(s) where the Vertica storage locations reside, e.g. for my DATA storage location on /data1/dbadmin/vertica:
    select * from v_internal.dc_storage_info_by_day where path = '/data1';
    (My /data1 is mounted on /dev/sdb1 so I could also use WHERE device = '/dev/sdb1')

    You can find the description of all system tables here, the ones you reference are all in V_MONITOR: https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/SystemTables/VerticaSystemTables.htm
    Data Collector tables in the V_INTERNAL schema are listed in V_MONITOR.DATA_COLLECTOR

  • Bryan, thanks for the tips. I am using 9.2 version. Vertica documentation states that it is recommended to have 40% free space for Vertica to run smoothly. I am trying to figure out which disk storage stats applies.

    Thank you all for help in understanding Vertica storage system better.

  • Bryan_HBryan_H Vertica Employee Administrator

    This is the query that support recommends to check for over 60% usage:

    SELECT node_name, storage_usage, (sum(disk_space_used_mb) * 100/sum(disk_space_free_mb + disk_space_used_mb))::int storage_usage_percent
    FROM disk_storage
    GROUP BY 1, 2
    --having (sum(disk_space_used_mb) * 100/sum(disk_space_free_mb + disk_space_used_mb))::int > 60
    ORDER BY 3 desc limit 10;
    --(Note, I have commented out the greater than 60 condition so all storage will be shown)

    It's necessary to keep storage free so it is always possible to write the catalog. Queries may fail if Vertica is unable to write data or temp, but the system may fail if catalog fills up. For production systems, we recommend storing catalog on its own mount point. On modern systems, perhaps a better metric than 60% full is to assume you need to use disk as swap space, so keep X GB free where X is the amount of physical RAM. Then, consider how much data you are loading, and leave more disk space free based on how much data you are loading.

  • Thanks Bryan. Appreciate all details.

  • You can learn some info about Disk Space Requirements for Vertica here: https://docsbay.net/disk-space-requirements-for-vertica
    Hopefully it helps!

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file