The Vertica Forum recently got a makeover! Let us know what you think by filling out this short, anonymous survey.
Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!
User-User similarity?
Hello, I received the following question from a Vertica community user. I am testing Vertica with R right now. I went through the example SDK for R and looking to see what I can do with it. I also read about presto but I don't see it being available just yet. I am also trying to build user-user similarity with Vertica. My input table is 3.9m rows of 372k users x itms. For example - user itm wt 1,000,156 19 1 1,000,156 11 25 1,000,156 18 1 1,000,156 1 1 As first step I am calculating the sum of weights for each user, or co-occurrence if you ignore weights. Following your example "http://www.vertica.com/2011/09/21/counting-triangles/", I ran - create table usr_cooccurence as select mat1.uid mat1_uid, mat2.uid mat2_uid, sum(mat1.wt + mat2.wt) total_wt from v_user_item_matrix mat1 inner join v_user_item_matrix mat2 on (mat1.style = mat2.style and mat1.uid < mat2.uid) group by 2,1 segmented by hash(mat1_uid, mat2_uid) all nodes; But I run out of temp space (increased it to 32G so that's all I can do). I can do this on Hadoop and am wondering if that is the right way to go. But so far, I like the convenience of SQL in Vertica. I think you have done a very good job and want to see if this is the right tool for the job. Can somebody please take a look at this and provide a recommendation? Thanks, Matt
0
Comments