Recommending Related YouTube Videos

PDF Publication Title:

Recommending Related YouTube Videos ( recommending-related-youtube-videos )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 005

1 p ifdtx =0 α(t,x)= 1 ifdtx =1  1q dtx is the length of the shortest path between t and x. The parameters p and q can be tweaked to approximately interpolates between a Depth First and Breadth First search. Using these gener- ated neighborhoods we are now able to compute the gradients for out objective function and use Stochastic Gradient Descent [Bot10] to optimize it. As the above model is trained to predict neighborhoods it is particularly useful in our setting where we would like to predict related videos which might not have been connected by a edge already. 4.1 Analysis of node2vec on the Youtube Data Graph We run the node2vec algorithm on our dataset and analyze the embeddings produced. We first ran node2vec on the whole unmodified graph to get 60 dimensional vectors and computed the average Euclidean distance between neighbors to be 1.23 . We then estimated the average distance be- tweennon-neighborsbysamplingsomerandomnodes,whichsurprisinglycameouttobe 0.9482 . Since the average distance between neighbors was higher than the average distance between non- neighbors, these learned embeddings clearly did not preserve distances well upon projection from the original metric space. As described in the previous section, around 90% of the nodes in our dataset are sink nodes. node2vec uses random walks starting from each node, to explore the graph structure. As random walks from these sink nodes are empty, hence node2vec is not able to compute satisfactory embed- dings for such a graph. Hence to combat this problem we only work with the strongly connected component of the graph. We subsequently analyze these embeddings in detail. 4.2 Analysis of node2vec on the SCC To try and find the optimal number of dimensions to use, we plot the average distance between neighbors, and the estimate of the average distance between non-neighbors, as a function of the number of dimensions in our learned embeddings. ifdtx =2 Average Euclidean distance vs # dimensions Average Cosine distance vs # dimensions 5 4 3 2 1 0.6 0.5 0.4 0.3 0.2 0.1 Neighbors Non Neighbors 00 0 20 40 60 80 100 #Dimensions Neighbors Non Neighbors 0 20 40 60 80 100 #Dimensions As is evident from the above graphs, there is not much difference in distances with change in di- mensions. node2vec seems robust enough to capture neighbor distances even with low number of dimensions. We find that in downstream tasks where we use the vectors for non-trivial recom- mendations, the difference between number of dimensions is more evident and hence we set the number of dimensions to 60 for a good balance between fast computability and performance. Also, note the difference between the euclidean distance and cosine similarity - there seems to be a larger gap between the two distance plots when using cosine similarity. This suggests that cosine 5 Euclidean distance Cosine distance

PDF Image | Recommending Related YouTube Videos

PDF Search Title:

Recommending Related YouTube Videos

Original File Name Searched:

cs224w-66-final.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)