DB_Designer and IO Load

Hello, We are in the process of moving to vertica database for our application from oracle. I read and studied how vertica works better compare to other database but I couldn understand how DBD(Database Designer) works. In order to make the database performance faster for application query we have to load sample data and supply sample queries to DBD so it can provide better projection for a table but the application I am going to implement on this DB is new and I dont know have accurate information on the data for a table and how the application queries will be, in this scenario how better I can design a table in vertica database for better performance. And I understand when I run DBD it almost rearranges every block in the disk for that database for better projection so better run DBD in the initial phase of database creation rather allowing DB to grow several TB's and run the DBD, let me know if my assumption is wrong. And vertica claims it can run on commodity hardware but the tuple mover (WOS to ROS) and reorganizing data process runs every 5 and 10 minutes respectively so in a fairly loaded system how much the CPU is peaked and how frequently the disk fails in a vertica system which need a replacement ?. The reason I am asking is because we have a hybrid application(OLTP and OLAP) in our current system every month we have a disk failure which needs the replacement. This is the disk model number "SEAGATE ST360057SSUN600G". If you can point me to some statistics it would be great. Regards, Ilan

Comments

  • I would be nice if some one can point me to a good documentation. Thanks.
  • Hi Ilan, The Vertica documentation is available here: http://www.vertica.com/documentation/documentation-6-1/ The Administrator's Guide has a section on the Database Designer. You're correct that the recommended approach is to load a little bit of your data, run the DBD, then continue loading. Vertica typically does more sequential scans on disk (as opposed to random disk IO) than most OLTP solutions, as long as you're performing queries on large tables. So drives should last longer. That said, the main cause of drives wearing out isn't heavy use; it's overheating. If you are experiencing lots of drive failures (I don't know if one a month is a lot; it's a lot if you have a three-drive RAID and one is failing, but it's to be expected if the one is out of 10,000 drives in a data center), you should make sure your drives are running within their specified temperature range. Running cooler will generally increase the drive's lifespan. Disk drives do run hotter when they are under heavy load (they are, after all, mechanical devices); I know I've seen systems that needed an extra cooling fan to run an intensive database application safely, though they had no problem with a lighter load. Adam
  • Adam, Thanks for the responses. But how often do we have to run the DBD since its a I/O intensive operation. When the performance of a table/sql is bad how do I determine its time now to run DBD to reorganize the data ?. When I mentioned disk failures in our environment I was taking about only a rack and only a DB is configured on that entire rack not the entire data center so fairly it has 36 disk which is configured of mirror 3. Regards, Ilan
  • Hi Ilan, For the DBD -- The DBD's main job isn't actually to organize your data. Though it does that too. What it really does is, it tells Vertica how data like yours should be organized in general; then it waits while Vertica does an initial reorganization. The Vertica server will then continuously optimize your data into that format as you continue to load more data. No need to ever re-run the DBD. The DBD's instructions depend on the layout of your data and on the queries/workload that you're running. So you should re-run the DBD if your data changes dramatically (for example, if you add columns or tables, or if the meaning of a column changes so the data in it starts to look very different from what it used to look like), or if you start running very-different queries. In some cases you can use the DBD's incremental mode to only update a few tables (or a few columns within a table) rather than re-writing the whole data set. Regarding disk failures, I'm not the best to comment on specific numbers on that particular hardware. Others here may speak up. Adam
  • Incidentally, a lot of these questions are answered as part of our free online training courses: http://www.vertica.com/customer-experience/12573-2/ Adam
  • Adam, Thanks for the detailed explanation will definitely go through the online training for more details. Appreciate your responses. Coming from a oracle background I am very surprised regarding the performance of vertica both in loading and query responses. And the size of the software is not even in GB's like Oracle and the initial performance we achieved in vertica is almost comparable to our exadata systems. -Ilan

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file