Options

recommended workflow for multiple databases?

Hi Folks. We are new to Vertica and found it relatively surprising that only one database at a time can be UP/active. In our research we need to access multiple databases at a time, so I'd like to know how other Vertica users manage this limitation. The only approaches I've thought of are a) taking turns (start and stop databases as needed), or b)(mis-)using schemas to group tables into their own databases. Thanks for your help!

Comments

  • Options

    You can have multiple databases. Each database would need dedicated nodes. With a 4 node cluster:

    DB1 on node1, node2
    DB2 on node3, node4

    The way Vertica is designed is intended for a single database instance. Vertica falls under the MPP (Massively Parallel Processing) category. Multiple databases would be competing for resources on across nodes in a cluster. The parallel design enables the distribution of storage and workload across the nodes. In my opinion, the best design is to logically create your schemas like you would databases.

  • Options
    Much obliged, Norbert. I'm curious to see what the performance is with just two nodes.
  • Options
    Hi Matthew, I'd be curious to hear more about what you're doing that's done more effectively with multiple databases than with multiple schemas. (Such use cases certainly exist; it would just be interesting to know which particular use case you're looking at, if you wouldn't mind sharing.) Also, beware that two-node databases don't support high availability; you need at least three. (So that, if your cluster gets split due to network issues, it's unambiguous which side is "bigger" and should stay up.) Thanks, Adam
  • Options
    Much obliged, Norbert. I'm curious to see what the performance is with just two nodes.
  • Options
    Hi Adam. Generally, we are a research lab that runs experiments applying our software on multiple databases because each has its own characteristics that impact our results. Some of those are synthetic and some are real-world, such as stackoverflow, movielens, netflix, and private databases from industries like finances, telecom, genetics, physics, and building control. In other words, we are all over the map. Ideally we'd have one that does it all, but that's not possible. Does that help?
  • Options
    Matthew--I think Adam is looking more specifically for what your experiments are doing (window functions, massive data loads, etc.). When you say "impact your results," it sounds like you're using different methods for each DBMS. Otherwise, it would be helpful to get clarification on those items.
  • Options
    I understand, Norbert. The details of the algorithms and resulting generated SQL are beyond me at this point. I'll follow up when I have more performance information (and when our students come up for air!) Much obliged.

  • Options
    Could you suggest why someone will dedicate nodes within the same cluster instead of creating separate clusters with those nodes:
    You are suggesting cluster C1 has nodes N1, N2, N3 & N4 where N1, N2 are for DB1 and N3, N4 are for DB2
    What about creating Cluster C1 with N1 and N2 and dedicating it for DB1 and creating another cluster C2 with N3 and N4 and dedicating it for DB2?
  • Options
    I think what I meant to say here is that you would have only a single database, and have the logical organization of "databases" at the schema level. Your first schema projection segmentations would be on nodes 1 & 2, and second schema projections on nodes 3 & 4, for example.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file