How to utilize the processing power of all nodes to run R linear regression modelling (lm for exampl

Running R linear models against a "Big Data" window seems to be bottle necked by the fact, that all data is transfered to one R executing node. Any chance to overcome this today - tomorrow ? Actually "Revolution" seems to have the only answer to that issue ? !

Comments

  • Hi, Today, Vertica itself can do this in a limited way -- If you write an R function via our SDK and install it into Vertica, we will parallelize and distribute its operation for you. (With R User-Defined Transforms, you do have to specify an OVER clause that allows distributed execution; otherwise the data will all be brought to the same node like you say. "OVER (PARTITION BY AUTO)" is what we recommend if the partitioning and sort order don't matter too much to your algorithm.) But you still have to invoke that R code by calling it as part of a SQL query. Also, since the code runs remotely (on your Vertica cluster), not on your local computer, it's at best difficult to do things like displaying graphs, rendering animations, etc. Typically people write R code in two parts -- one, the function to do the analysis (which is run distributed on the cluster); two, the local code which connects to Vertica over ODBC, fetches the final data/results, and displays them. Tomorrow, well, I can't promise anything myself :-) But take a look at this blog post: http://www.vertica.com/2013/02/21/presto-distributed-r-for-big-data/ Part of the problem is that innovations in R are needed to support parallelism at all. Another part of the problem is that there's no magic "parallelize" button -- the algorithm that you want to use to run serially on the local machine, and the algorithm that you need to efficiently distribute and parallelize the work, are often different algorithms; and R doesn't always already have both. We're thinking about both of these issues, but especially when it comes to the huge array of algorithms available in R, we know that we'll need help from the community to come up with efficient parallel equivalents for the bulk of them. Adam

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file