Optimized Hardware for Vertica

drumacedrumace Registered User

Hello,
I use Vertica to perform all sorts of analytical calculations on large input files (millions to tenths of millions of entries to process).
The hardware I'm using is a PC workstation with Intel i7 Extreme processor, 64GB RAM and 4 512GB SSD drives for the storage which operates as a RAID5 device.
I'm using both single node configuration and triple node configuration.
Each table set relevant for a specific analysis is fully stored on one node (data is not broken between nodes) so there should be no bottleneck related to the network between the workstations.
The current performance I'm getting is not satisfying for my application and I'm searching for ways to improve the results before considering another analytical database.
The queries were already optimized with a help of a certified Vertica expert so the answer is not there, that's why I'm looking at the hardware level solution.
Please let me know what you think about the current hardware I'm using and suggest improvements.

Thanks,
Tal

Comments

  • drumacedrumace Registered User

    Still not a single answer?

  • Ben_VandiverBen_Vandiver Employee, Registered User, VerticaExpert

    It's hard to say much concrete without more details about the workload. How much concurrency? Do joins spill? Query duration? Disk footprint? What resource is the bottleneck?
    Generically, 64GB of RAM looks a little small compared to other hardware config I've seen. But your problem also looks on the small side (10M rows - I know Vertica databases with 10T+ rows). So tuning may look different at your scale.

  • Jim_KnicelyJim_Knicely Employee, Registered User, VerticaExpert
    edited May 1

    It's quite obvious. Bad design? Can you share some SQL with explain plans?

  • drumacedrumace Registered User

    Hi guys,
    Thanks for the response.
    We do not use joins at all. Concurrency is pretty low (maybe 5-10 users at the same time at extreme cases).
    Query duration takes 5-10 seconds on large input files and can get even higher on some queries.
    Some of the queries are implemented as UDFs.
    As for resources - I used the "top" command and in some cases the CPU reaches above 300%. The RAM utilization is pretty low though (mostly free). The disks are also mostly free.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file