Recommending Related YouTube Videos

PDF Publication Title:

Recommending Related YouTube Videos ( recommending-related-youtube-videos )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 003

tion to the SCC. In Section 4, we will see experimental results showing that node2vec indeed does not give good results for the entire graph, due to lack of information associated with the majority of the nodes. Further, we also examine the clustering coefficients for the entire graph (0.142) and the SCC (0.441). For our problem, the clustering coefficients are a rough indicator of the difficulty of our problem - if the clustering coefficient is high, our problem is in some sense easy, since the natural algorithm of giving prediction as a neighbor’s neighbor (ie, a node 2 hops away) will work well. On the other hand, a low clustering coefficient would mean that there is not much benefit to us- ing graph information in our recommender system. Hence, this is a further motivation for working only with the SCC - we have reason to believe that our methods would be more applicable in this setting. Finally, consider the way this graph has been generated - using a Breadth First Search crawl from a node, with us discovering at most 20 new nodes at each step. Now, the majority of the nodes will be discovered in the last step of the crawl, due to the exponentially growing nature of the BFS tree. These nodes will not have any outgoing edges, since the BFS has ended. Hence, our dataset is only meaningful when we consider the SCC, where each node present has had outgoing edges explored. 3.3 Analysis of SCC Out Degree 0 1 ≤ ≤ 5 6 ≤ ≤ 10 11 ≤ ≤ 15 16 ≤ ≤ 20 Proportion of nodes in the entire graph 0.868 0.0004 0.0006 0.0178 0.113 Proportion of nodes in SCC 0 0.337 0.327 0.218 0.118 Table 1: out-degree distribution The SCC has over 216k nodes and 1.85M edges. Hence, focusing on a subset of the graph does not render our problem trivial, since the size is still quite large. To examine the structure of the graph, in Figure 1 we plot the in-degree distributions of the graph. The in-degree of a node is somewhat analogous to the popularity of the video; the more popular a video is, the more videos will be link- ing to it, and hence the more it’s in-degree would be. We observe that the in-degree behaves as expected - there are few very popular videos of high in- degree, and many videos that have only a few others linking to them. We can make the observation that a good null model graph for this dataset would be the preferential attachment model [BA99] - since it is likelier for new videos to link to already popular ones, rather than to link to unpopu- lar ones. Hence this "rich get richer" type of graph formation gives rise to the degree distribution observed. Next, we examine the metadata of the videos to see whether the features provided can themselves form a good baseline for prediction. One natural approach is to group videos based on their cate- gory, since we would expect related videos are always in the same category. In Figure 2, we plot a histogram of clustering coefficients of the various subgraphs of the SCC that belong to the re- specting categories, and compare with clustering coefficient of the overall graph. As explained earlier, we use the clustering coefficient as a proxy for measuring ease of prediction. As we can see, the clustering coefficients are much lower - which would not be the case if edges were often in the same category. This may be explained as the YouTube algorithm balancing exploration and exploitation, and trying to present the viewer with a diverse set of recommendations, as explained in [DLL+10]. We measure the probability of a recommended video being of the same category as p=0.5958 Anothernaturalideaistopredictothervideosbythesameuploader.Wemeasurethe empirical probability that an edge connects videos created by the same uploader, and find that the probability p=0.1590 3

PDF Image | Recommending Related YouTube Videos

PDF Search Title:

Recommending Related YouTube Videos

Original File Name Searched:

cs224w-66-final.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)