Deep Neural Networks for YouTube Recommendations

PDF Publication Title:

Deep Neural Networks for YouTube Recommendations ( deep-neural-networks-youtube-recommendations )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 004

approx. top N class probabilities nearest neighbor index video vectors user vector ReLU ReLU ReLU serving training example age gender geographic embedding watch vector search vector average embedded video watches embedded search tokens Figure 3: Deep candidate generation model architecture showing embedded sparse features concatenated with dense features. Embeddings are averaged before concatenation to transform variable sized bags of sparse IDs into fixed-width vectors suitable for input to the hidden layers. All hidden layers are fully connected. In training, a cross-entropy loss is minimized with gradient descent on the output of the sampled softmax. At serving, an approximate nearest neighbor lookup is performed to generate hundreds of candidate video recommendations. case in which the user has just issued a search query for “tay- lor swift”. Since our problem is posed as predicting the next watched video, a classifier given this information will predict that the most likely videos to be watched are those which appear on the corresponding search results page for “tay- lor swift”. Unsurpisingly, reproducing the user’s last search page as homepage recommendations performs very poorly. By discarding sequence information and representing search queries with an unordered bag of tokens, the classifier is no longer directly aware of the origin of the label. Natural consumption patterns of videos typically lead to very asymmetric co-watch probabilities. Episodic series are usually watched sequentially and users often discover artists in a genre beginning with the most broadly popular before focusing on smaller niches. We therefore found much better performance predicting the user’s next watch, rather than predicting a randomly held-out watch (Figure 5). Many col- laborative filtering systems implicitly choose the labels and context by holding out a random item and predicting it from other items in the user’s history (5a). This leaks future infor- mation and ignores any asymmetric consumption patterns. In contrast, we “rollback” a user’s history by choosing a ran- dom watch and only input actions the user took before the held-out label watch (5b). 3.5 Experiments with Features and Depth Adding features and depth significantly improves preci- sion on holdout data as shown in Figure 6. In these exper- iments, a vocabulary of 1M videos and 1M search tokens were embedded with 256 floats each in a maximum bag size of 50 recent watches and 50 recent searches. The softmax layer outputs a multinomial distribution over the same 1M video classes with a dimension of 256 (which can be thought of as a separate output video embedding). These models were trained until convergence over all YouTube users, corre- sponding to several epochs over the data. Network structure followed a common “tower” pattern in which the bottom of the network is widest and each successive hidden layer halves the number of units (similar to Figure 3). The depth zero network is effectively a linear factorization scheme which average softmax

PDF Image | Deep Neural Networks for YouTube Recommendations

PDF Search Title:

Deep Neural Networks for YouTube Recommendations

Original File Name Searched:

45530.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)