we have a system that load high volume of data every minute using Kafka and Vertica
what is the best practices and important consideration for designing projection for this case ?
From my experience:
Sorry, "silver bullet" not exists(imho).
The general rule is to stand up your cluster, load a good amount of data, and then run the DBD on the entire dataset along with a representative set of queries. You can then edit the DDL as you see fit or just accept the entire recommendation. Projection design is an iterative process; it's something you'll want to re-visit over time as more data, more users, and new queries are added to your database.