Machine Learning using UDFs

Hey All,

Has anyone considered using Vertica for Data Science? Vertica 8.1 does have some good built-in features for analytics/modeling, but from a customization perspective, it it limited to what its built to do.

  1. Thus, is writing UDF a solution to that? Can we write different kinds of classifications, regression, association learning models in the UDFs and save our models? If yes, can someone point me to an example? - This is not the predefined Vertica modeling functions

  2. Can we achieve parallel processing gains through UDFs without a partition by clause? A simple example: I want to perform market basket analysis using the standard algorithm on the web. The input is a transactional data set with items sold in a every order. Let's say this is transactional set is huge (many orders). If I try to run this using UDF's in R, it takes a very long time even for a small number of rows. However, I can run the same function on my local R machine in less than 15 secs. I cannot partition this data since I need it to interpret as one complete dataset.

Any suggestions would be helpful!


Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file