User-User similarity?
Hello, I received the following question from a Vertica community user. I am testing Vertica with R right now. I went through the example SDK for R and looking to see what I can do with it. I also read about presto but I don't see it being available just yet. I am also trying to build user-user similarity with Vertica. My input table is 3.9m rows of 372k users x itms. For example - user itm wt 1,000,156 19 1 1,000,156 11 25 1,000,156 18 1 1,000,156 1 1 As first step I am calculating the sum of weights for each user, or co-occurrence if you ignore weights. Following your example "http://www.vertica.com/2011/09/21/counting-triangles/", I ran - create table usr_cooccurence as select mat1.uid mat1_uid, mat2.uid mat2_uid, sum(mat1.wt + mat2.wt) total_wt from v_user_item_matrix mat1 inner join v_user_item_matrix mat2 on (mat1.style = mat2.style and mat1.uid < mat2.uid) group by 2,1 segmented by hash(mat1_uid, mat2_uid) all nodes; But I run out of temp space (increased it to 32G so that's all I can do). I can do this on Hadoop and am wondering if that is the right way to go. But so far, I like the convenience of SQL in Vertica. I think you have done a very good job and want to see if this is the right tool for the job. Can somebody please take a look at this and provide a recommendation? Thanks, Matt
0
Comments