Practical Diversified Recommendations on YouTube with Determinantal Point Processes

PDF Publication Title:

Practical Diversified Recommendations on YouTube with Determinantal Point Processes ( practical-diversified-recommendations-youtube-with-determina )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 004

CIKM’18, October 2018, Turin, Italy Mark Wilhelm, Ajith Ramanathan, Alex Bonomo, et al. Figure 2: The new serving scheme. this can be done in a way which integrates and evolves well with existing infrastructure. 3.3 Design Desiderata If item similarity (as defined in Equation 4) exists in the dataset, and the dataset is sufficiently large, then our goal can likely be achieved by a wide variety of different methods. We favor a method which: (1) fits well into the existing logical framework of building machine-learned estimators of observable physical events, (2) can gracefully scale in complexity over time, and (3) can be applied without requiring huge changes to existing systems and expertise. Heuristics can be effective [45] but are not ideal. For example, imagine enforcing the rule that within a window of n adjacent items,notwoitemsmayhaveDij <τ.Anumberofissuesarise: (1) This rule operates independently of q, meaning that it will demote high-scoring items under the same conditions as it does low-scoring items. Independent improvements to the accuracy of q may be lost after applying the policy. (2) Parameters n and τ can be found by brute force grid search, but adding complexity will become prohibitive, as training time will be exponential in the number of parameters. (3) It is not entirely obvious how to extend the rule to make incremental improvements over time, beyond somehow in- corporating q. (4) It cannot be used as a generative model to create synthetic datasets for offline validation. An important point is that this heuristic implicitly treats the redundancy problem as a fundamentally different objective from maximizing utility. In fact, it suggests the hypothesis that improving diversity might reduce utility (at least in the short term), since it throws away items that have high q values. In contrast, our proposed approach considers utility on pairs of items (via the anti- correlation described in Equation 4), and hence is able to use utility itself to better justify demoting certain items. Of course, it is possible to define a heuristic based upon anti- correlation, such as “no two items are allowed in the same feed if P(yi =1,yj =1) is below x”. However, as mentioned above, this rule P(yi=1)P(yj=1) does not account for q, would require frequent re-tuning of the parameter x, and even with regular tuning is not flexible enough to capture exactly the behavior that we desire. Hence, in place of such heuristic rules, we introduce DPPs into the system as a more principled way of diversifying recommendations. We insert the DPPs before the policy layer, but after the point- wise scoring layer (see Figure 2). This allows us to leverage the investment in an extremely sophisticated pointwise scorer, and also ensures that business policies are respected. 4 METHOD 4.1 DPP Overview We start with a high-level overview of determinantal point pro- cesses (DPPs). A point process P on a set S = {1,2,...,N} (e.g., a set of N videos in a user’s YouTube mobile homepage feed) is a probability distribution on the powerset of S (the set of all sub- sets of S). That is, ∀S ⊆ S, P assigns some probability P(S), and 􏰽S ⊆S P(S) = 1. DPPs represent a family of probability distribu- tions whose parameters can be tuned such that the probability of a subset, P(S), is proportional to a combined measure of the quality of the items in S and the diversity of these items. Thus, finding the set maxS:|S |=k P(S) is one way of selecting a high-quality and diverse subset of k items from a larger pool of N items. As mentioned in Section 2, there are many reasonable measures that take into account both item quality and diversity, such as the maximal marginal relevance (MMR) approach [5]. The advantage of using DPPs is two-fold: 1) DPPs can out-perform measures such as MMR on recommendation tasks [20], and 2) a DPP is a proba- bilistic model. This latter point means that we can take advantage of algorithms for probabilistic operations like marginalization, con- ditioning, and sampling. The availability of these operations aligns well with our goal of building a system that can gracefully scale in complexity over time. We now describe how we use a DPP to model user behavior. Recall that for a feed with N items, the length-N binary vector y indicates which videos from the feed the user interacted with. Let Y denote the index set of these items (e.g., for y = [0, 1, 0, 0, 1, 1] we have Y = {2, 5, 6}). Then we assume that a user u’s behavior is modeled by a DPP with probability distribution P in the following manner: Y ∼ Pu . That is, the set of videos interacted with, Y , represents a draw from the probability distribution defined by a user-specific DPP. Even though a DPP defines a probability distribution over an exponentialnumberofsets(all2N subsetsofS={1,2,...,N}),it can be compactly parameterized by a single N × N positive semi- definite kernel matrix [4], which we will call L. More concretely, the probabilities of a DPP can be written as determinants of submatrices of L: det(LY ) P(Y)= 􏰽Y′⊆Sdet(LY′) . (6) where LY is L restricted to only those rows and columns which are indexed by Y (e.g., for Y = {2,5,6}, the matrix LY is size 3 × 3). Note that the denominator in Equation 6 is simply a normalizing term, and, due to various properties of determinants, it can actually be written and computed efficiently as a single determinant: 􏰾 det(LY ) = det(L + I) , (7) Y ⊆S where I is the identity matrix. To see how det(LY ) can define a balanced measure of the quality and diversity of a set of items, it helps to think of the entries of L in the following manner: 1) a diagonal entry Lii is a measurement of the quality of an item i, and 2) an off-diagonal element Lij is a scaled measurement of the similarity between items i and j. With these intuitions, let’s consider a case where |Y | = 2. If Y = {1, 2},

PDF Image | Practical Diversified Recommendations on YouTube with Determinantal Point Processes

practical-diversified-recommendations-youtube-with-determina-004

PDF Search Title:

Practical Diversified Recommendations on YouTube with Determinantal Point Processes

Original File Name Searched:

cikm2018.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com | RSS | AMP