recommended workflow for multiple databases?

matthewcornell · January 2014

Hi Folks. We are new to Vertica and found it relatively surprising that only one database at a time can be UP/active. In our research we need to access multiple databases at a time, so I'd like to know how other Vertica users manage this limitation. The only approaches I've thought of are a) taking turns (start and stop databases as needed), or b)(mis-)using schemas to group tables into their own databases. Thanks for your help!

norbert_krupa · January 2014

You can have multiple databases. Each database would need dedicated nodes. With a 4 node cluster:

DB1 on node1, node2
DB2 on node3, node4

The way Vertica is designed is intended for a single database instance. Vertica falls under the MPP (Massively Parallel Processing) category. Multiple databases would be competing for resources on across nodes in a cluster. The parallel design enables the distribution of storage and workload across the nodes. In my opinion, the best design is to logically create your schemas like you would databases.

matthewcornell · January 2014

Much obliged, Norbert. I'm curious to see what the performance is with just two nodes.

[Deleted User] · January 2014

Hi Matthew, I'd be curious to hear more about what you're doing that's done more effectively with multiple databases than with multiple schemas. (Such use cases certainly exist; it would just be interesting to know which particular use case you're looking at, if you wouldn't mind sharing.) Also, beware that two-node databases don't support high availability; you need at least three. (So that, if your cluster gets split due to network issues, it's unambiguous which side is "bigger" and should stay up.) Thanks, Adam

matthewcornell · January 2014

Much obliged, Norbert. I'm curious to see what the performance is with just two nodes.

matthewcornell · January 2014

Hi Adam. Generally, we are a research lab that runs experiments applying our software on multiple databases because each has its own characteristics that impact our results. Some of those are synthetic and some are real-world, such as stackoverflow, movielens, netflix, and private databases from industries like finances, telecom, genetics, physics, and building control. In other words, we are all over the map. Ideally we'd have one that does it all, but that's not possible. Does that help?

norbert_krupa · January 2014

Matthew--I think Adam is looking more specifically for what your experiments are doing (window functions, massive data loads, etc.). When you say "impact your results," it sounds like you're using different methods for each DBMS. Otherwise, it would be helpful to get clarification on those items.

matthewcornell · January 2014

I understand, Norbert. The details of the algorithms and resulting generated SQL are beyond me at this point. I'll follow up when I have more performance information (and when our students come up for air!) Much obliged.

Pradipta_Bibek_ · October 2014

Could you suggest why someone will dedicate nodes within the same cluster instead of creating separate clusters with those nodes:
You are suggesting cluster C1 has nodes N1, N2, N3 & N4 where N1, N2 are for DB1 and N3, N4 are for DB2
What about creating Cluster C1 with N1 and N2 and dedicating it for DB1 and creating another cluster C2 with N3 and N4 and dedicating it for DB2?

norbert_krupa · October 2014

I think what I meant to say here is that you would have only a single database, and have the logical organization of "databases" at the schema level. Your first schema projection segmentations would be on nodes 1 & 2, and second schema projections on nodes 3 & 4, for example.

We're Moving!

Create My New Community Account Now

recommended workflow for multiple databases?

Comments

Leave a Comment