Coverage, Redundancy and Size-Awareness in Genre Diversity for Recommender Systems
Uploaded by: Alt-Tab
Upload date: 2019-08-23 09:21:59

Comments:

# In short Aim at introducing a diversity notion for recommendation which combines different existing notions of diversity (intra-list diversity, coverage, redundancy), and then apply re-ranking technique. # Summary ### Introduction * approach based on genre Intra-List Similarity * they aim at 3 different properties: genre coverage, (non-)redundancy, list size awareness * dataset: movies rec with Netflix prize ### Related Work ##### Diversity in Recommender Systems * Herlocker et al : accuracy alone insufficient to assess satisfaction * McNee et al : defining properties related to satisfaction (coverage, diversity, novelty, serendipity) * ref 14 (Pu et al) : increasing diversity increases satisfaction * ref 22 (Ziegler et al) : introduce Intra-List Diversity * ref 7 (Clarke et al) : ILD limited when considering query results, as queries are short and ambiguous * ref 15 (Santos et al) : propose to cover a maximum of subtopics in the first results (as for a web research) ##### Measuring and enhancing diversity * frameworks to improve diversity largely rely on re-ranking * usual approach: greedy selection, assumes the definition of an objective function (see algo1, à la Ziegler), pairwise framework, measure based on the ILS (or ILD); in ref 21 (Zhang and Hurley) same kind of strategy * framework intent-aware: optimization of coverage (particularly to circumvent ambiguity problems), ref15 proposes xQuAD for example * framework proportionality aims at covering topics proportionally to the user interest, ref 9 (Dang and Croft) for example ### Characterizing genres * what characterizes a genre * following limitations (hierarchy of meaning, unbalanced distribution, overlap between genres, ...) * dataset Netflix: 100M ratings (1 to 5), 480.000 users, around 18000 movies; genres extracted from IMDB => info on 9300 movies (meaning 83% of the ratings) ### Measuring genre diversity in recommendation lists * a diversity measure should capture genre coverage (covering a maximum of genres, proportionally to user interest) * redundancy (important that items in the list cover a genre but also that other items do not cover this genre) * size-awareness (the previous two should take into account the size of the rec list, e.g. if the list is short only most important genres) * limitations of the literature: Ziegler's ILS, ref5's MMR are pairwise notions which are not well suited to evaluate notions such as a genre generality * intent-aware frameworks (refs 2 and 15) do not fully account for the idea that it is important that items do not cover a genre represented in the list, assumes that genres are independent from each other * ref9 (Dang and Croft) use the notion of proportionality to the user interest but do not penalize redundancy * no existing method take the length of the list into account ### Binomial framework for genre diversity * general principle: random distribution is considered as reference for optimal => model likelihood for a genre to randomly appear in a list according to a binomial distribution ##### Binomial diversity metric * selection of an item from a genre is seen as a Bernoulli test * n.b.: theoretically selection without replacement, practically nearly equivalent to selection with replacement * formal definitions: item i covers genre G(i) ; k_g^s = number of success on set s that item has genre g ; p_g" is proportion of interactions of a user with genre g (local importance) ; p_g' is proportion of interactions of all users with genre g (global importance) ; p_g = (1-alpha).p_g' + alpha.p_g" is the expected probability of a genre g to be in rec list R * coverage score: product of the probabilities for the genres not represented in R not to be selected randomly following the Bernoulli process (eq9) * non-redundancy score: measures how probable it is that a genre appears at least k times in R (so it's a kind of remaining tolerance) (eq10) * binomial diversity = coverage . non-redundancy * BinomDiv has appropriate properties: maximizes coverage as a function of p_g, penalizes over-representation of genres, adapts to the list length with the number of tests to do to create R ##### Binomial re-ranking algorithm * greedy re-ranking to optimize a trade-off function between relevance and diversity (eq13), parametrized by lambda ##### Qualitative analysis * results in Table 3: see how various diversity metrics behave in 4 different specific ranking situations ; principal conclusion is that BinomDiv is the only one which works all for all these situations * results in Table 4: (item based kNN + reranking) ; we observe the qualitative results of the reranking, depending on the user tastes ### Experiments * Two experiments with two datasets: Netflix prize + imdb genres (83M ratings, 480K users, 9300 movies, 28 genres) * MovieLens 1M ##### Setup * 5-fold cross-validation * RS rank all movies above a given threshold (grade) for the user considered + 1000 random movies of the dataset * RS tested: item-based CF kNN ; CF implicit Matrix Factorization ; item popularity ; random * reranking optimization is done with a grid search on lambda (trade-off diversity/relevance parameter) * diversity evaluation with all index in the literature (EILD, ERR-IA, CPR) + subtopic recall + subtopic per item * relevance evaluation with nDCG ##### Results for baseline diversity * Tab5: résults without diversification reranking (reminder: alpha reflects personalization degree) * random: very low relevance ; strong diversity * popularity: better relevance ; weaker diversity * personalized RS tend to have weaker non-personalized diversity scores but improve when the user history is taken into account ##### Results for diversified results * Tab6: résults after reranking, cutoff 20 items ; alpha =0.5 ; best lambda found with grid search * all diversifications => accuracy decreases * any diversification process is best when diversity evaluation is realized with it * xQuAD and ERR-IA tend to accumulate genres without penalizing redundancy * ERR-IA and CPR-rel correlated to SPI (subtopics per item) * Fig3: view improvement to baseline * BinomDiv can improve to baseline for nearly every diversity metric * general conclusion: BinomDiv able to bring more coverage while limiting redundancy * Tab7: explores size-awareness by changing cut-off value, diversification relative to lambda always best with the corresponding size [?]
Alt-Tab at 2019-08-23 09:52:44
Edited by Alt-Tab at 2019-08-23 12:45:53

You comment anonymously! You will not be able to edit/delete the comment.

Please consider to register or login.

Use $\LaTeX$ to type formulæ and markdown to format text.
When you post something to which you hold the copyright you authorise us to do distribute this data across the scientific community. You can post public domain content. All user-generated content will be freely available online. Please see this page to learn more about Papersγ's terms of use and privacy policy.