Machine Learning using UDFs
Hey All,
Has anyone considered using Vertica for Data Science? Vertica 8.1 does have some good built-in features for analytics/modeling, but from a customization perspective, it it limited to what its built to do.
Thus, is writing UDF a solution to that? Can we write different kinds of classifications, regression, association learning models in the UDFs and save our models? If yes, can someone point me to an example? - This is not the predefined Vertica modeling functions
Can we achieve parallel processing gains through UDFs without a partition by clause? A simple example: I want to perform market basket analysis using the standard algorithm on the web. The input is a transactional data set with items sold in a every order. Let's say this is transactional set is huge (many orders). If I try to run this using UDF's in R, it takes a very long time even for a small number of rows. However, I can run the same function on my local R machine in less than 15 secs. I cannot partition this data since I need it to interpret as one complete dataset.
Any suggestions would be helpful!
Thanks,
Deepen
Comments
Hi!
Not so helpful answer, but may be Distributed R can help: