Periodic, large-query caching

Chris_2 · April 2014

I am surprised there is less discussion around query caching. My use-case is a daily/hourly-updated set of graphs/stats on the data in a Vertica DB. As some of these graphs can take a little bit of time to generate, I'm wondering what sorts of things people are using to facilitate periodic running of queries and/or query result caching. I'd be happy with a total black box cache.

I presume there is a favored middle-ware being used? Either that or people will say "oh you shouldn't use Vertica to support a front-end directly". Any suggestions? We love Vertica, It's just we want to make the user experience a little better.

Thanks

Adrian_Oprea_1 · April 2014

No cache for result or plans in Vertica.
Vertica makes use of the Linux file system cache, maybe this will do it for you, as if you do not modify the file each time you open it.
Linux will hold the file's information in copy-on-write pages in memory.(i don't know how Vertica will handle them while they are in memory)
Maybe like you said Vertica is not build for this type of work as you imagine caching an analytic query(lot's of sorted data).
If any middle-ware would do this it will still be based on the existing data.
Maybe in the future this will be possible.

[Deleted User] · April 2014

Hi Chris,

Adrian is correct -- Vertica does not currently cache query plans or results. There is increasing demand for the ability to throw a low-latency UI on top of a Vertica database and serve this sort of Web-type workload, but our roots are as an analytics database that runs big queries quickly.

A simple solution is to cache manually -- save the result of your computation by doing an INSERT .. SELECT into a lookup table (and update the table every hour through a scheduled job of some sort), then just have your applications query this side table.

More-complex deployments occasionally have Vertica run the results hourly (or minutely, etc); then push those results on a regular basis into a small OLTP database (ie., MySQL, various commercial products, etc) or a caching solution (memcached, possibly Redis, etc) that can handle a very high load of much-simpler queries on a small data set. If you need sub-millisecond response times, nothing beats a small in-memory single-node system.

It would be straightforward to write a Vertica UDT that cached a resultset in memcached or similar. (I'm not aware of anyone having done so and released the result, but it would be a good introductory project for using our APIs; we'd be glad to accept a Github pull request or, I imagine, a posting on the Vertica Marketplace for that code, should someone write it and want to release it.) You would have to specifically invoke it as part of your query every time, though. And that just caches results; it would still re-plan the query, which takes some (small but nonzero) amount of time.

Many SQL frameworks provide their own result-set caching solutions. If you're using any sort of framework for your development, you might want to go that route.

Adam

Adrian_Oprea_1 · April 2014

Great answer, and yes using caching with other products might be the solution, but i don't see Vertica being used to return small querys.
Why would you use a tank to deliver a pizza ? use a bike(MySQL) or tricycle(SQLite)

Chris_2 · April 2014

Well It's less a one or the other, and more of a "I have so much data in my Vertica DB and It would be nice not having to put up another database to support a frontend just to essentially cache large, slow once-a-day Vertica query results".

It's mostly just me trying to avoid more intermediate data representations.

Most of the middleware's / ORMs / DAO systems don't seem to work with Vertica as It might be missing some of the more "fancy" JDBC driver support? I could be wrong.

Adrian_Oprea_1 · April 2014

I think that a load balancer would do it for you ! Redirecting specific querys from a specific user to Vertica and keep all your small(repetitive) to your MySQL row based rdbms(example).
But i don't know how would you split that work !

[Deleted User] · April 2014

Curious -- what version of the drivers were you testing with, to see these issues with frameworks? This was very much the case with older JDBC drivers, some time ago. But when we added Java 6 as a dependency (I don't recall exactly when this was but probably some Vertica 6.x?), we upgraded our JDBC API and added support for a number of fancier JDBC features.

If there's a specific bit of functionality that your ORMs depend on that's missing, I'd be curious to hear about it.

If you really do just want to cache expensive once-a-day results, and if you don't need super-low-latency queries, the simplest approach of just generating those results nightly and storing them in a small Vertica table might actually work quite well for you.

We're Moving!

Create My New Community Account Now

Periodic, large-query caching

Comments

Leave a Comment