Periodic, large-query caching
I am surprised there is less discussion around query caching. My use-case is a daily/hourly-updated set of graphs/stats on the data in a Vertica DB. As some of these graphs can take a little bit of time to generate, I'm wondering what sorts of things people are using to facilitate periodic running of queries and/or query result caching. I'd be happy with a total black box cache.
I presume there is a favored middle-ware being used? Either that or people will say "oh you shouldn't use Vertica to support a front-end directly". Any suggestions? We love Vertica, It's just we want to make the user experience a little better.
Thanks
I presume there is a favored middle-ware being used? Either that or people will say "oh you shouldn't use Vertica to support a front-end directly". Any suggestions? We love Vertica, It's just we want to make the user experience a little better.
Thanks
0
Comments
Vertica makes use of the Linux file system cache, maybe this will do it for you, as if you do not modify the file each time you open it.
Linux will hold the file's information in copy-on-write pages in memory.(i don't know how Vertica will handle them while they are in memory)
Maybe like you said Vertica is not build for this type of work as you imagine caching an analytic query(lot's of sorted data).
If any middle-ware would do this it will still be based on the existing data.
Maybe in the future this will be possible.
Adrian is correct -- Vertica does not currently cache query plans or results. There is increasing demand for the ability to throw a low-latency UI on top of a Vertica database and serve this sort of Web-type workload, but our roots are as an analytics database that runs big queries quickly.
A simple solution is to cache manually -- save the result of your computation by doing an INSERT .. SELECT into a lookup table (and update the table every hour through a scheduled job of some sort), then just have your applications query this side table.
More-complex deployments occasionally have Vertica run the results hourly (or minutely, etc); then push those results on a regular basis into a small OLTP database (ie., MySQL, various commercial products, etc) or a caching solution (memcached, possibly Redis, etc) that can handle a very high load of much-simpler queries on a small data set. If you need sub-millisecond response times, nothing beats a small in-memory single-node system.
It would be straightforward to write a Vertica UDT that cached a resultset in memcached or similar. (I'm not aware of anyone having done so and released the result, but it would be a good introductory project for using our APIs; we'd be glad to accept a Github pull request or, I imagine, a posting on the Vertica Marketplace for that code, should someone write it and want to release it.) You would have to specifically invoke it as part of your query every time, though. And that just caches results; it would still re-plan the query, which takes some (small but nonzero) amount of time.
Many SQL frameworks provide their own result-set caching solutions. If you're using any sort of framework for your development, you might want to go that route.
Adam
Why would you use a tank to deliver a pizza ? use a bike(MySQL) or tricycle(SQLite)
It's mostly just me trying to avoid more intermediate data representations.
Most of the middleware's / ORMs / DAO systems don't seem to work with Vertica as It might be missing some of the more "fancy" JDBC driver support? I could be wrong.
But i don't know how would you split that work !
If there's a specific bit of functionality that your ORMs depend on that's missing, I'd be curious to hear about it.
If you really do just want to cache expensive once-a-day results, and if you don't need super-low-latency queries, the simplest approach of just generating those results nightly and storing them in a small Vertica table might actually work quite well for you.