How does Vertica handle large-scale data processing?

brettlee · April 2023

I am currently exploring options for processing and analyzing large-scale data in my organization, and I am curious about how Vertica handles this task. Can anyone provide insights into the scalability and performance of Vertica when dealing with large datasets? Additionally, are there any best practices or tips for optimizing data processing with Vertica? Thank you in advance for any help or advice.

VValdar · April 2023

MPP database - meaning adding more nodes allows to increase performance in a quite linear manner.
SQL database - probably 99% of what you already know in SQL will work with Vertica.
Columnar - data compresses extremely well, leading to more data processed per IO. Only requested columns in a query are read, thus there's no waste of resources there.
Using projections - as we compress data very well, we don't mind having same data physically stored in different ways to serve different use cases.
Tons of functions - you can move lots of big data workloads on the Vertica cluster, even some of the ML ones. Brining the algorithms to the data is more efficient than bringing the data to the algorithms.
With our EON mode, we separate compute and storage, this allows you for example to have a dedicated cluster for data science than will not hinder the classic workloads that are also running.
You can try it by yourself using our Community Edition (limited to 3 nodes and 1 TB of raw data ingested).
And yes, there are lots of best practices - most of them are common to RDBMS, others are specific to MPP and columnar, but even "as-is" our customers see massive improvements from let's say Oracle DB or postgresql.

We're Moving!

Create My New Community Account Now

How does Vertica handle large-scale data processing?

Answers

Leave a Comment