Coverage, Redundancy and Size-Awareness in Genre Diversity for Recommender Systems

☆

Download

Authors: Saúl Vargas, Linas Baltrunas, Alexandros Karatzoglou, Pablo Castells
Liked by:
Domains: Recommendation
Tags: diversity, recommendation system, coverage

Uploaded by: Alt-Tab
Upload date: 2019-08-23 09:21:59

Comments:

# In short Aim at introducing a diversity notion for recommendation which combines different existing notions of diversity (intra-list diversity, coverage, redundancy), and then apply re-ranking technique. # Summary ### Introduction * approach based on genre Intra-List Similarity * they aim at 3 different properties: genre coverage, (non-)redundancy, list size awareness * dataset: movies rec with Netflix prize ### Related Work ##### Diversity in Recommender Systems * Herlocker et al : accuracy alone insufficient to assess satisfaction * McNee et al : defining properties related to satisfaction (coverage, diversity, novelty, serendipity) * ref 14 (Pu et al) : increasing diversity increases satisfaction * ref 22 (Ziegler et al) : introduce Intra-List Diversity * ref 7 (Clarke et al) : ILD limited when considering query results, as queries are short and ambiguous * ref 15 (Santos et al) : propose to cover a maximum of subtopics in the first results (as for a web research) ##### Measuring and enhancing diversity * frameworks to improve diversity largely rely on re-ranking * usual approach: greedy selection, assumes the definition of an objective function (see algo1, à la Ziegler), pairwise framework, measure based on the ILS (or ILD); in ref 21 (Zhang and Hurley) same kind of strategy * framework intent-aware: optimization of coverage (particularly to circumvent ambiguity problems), ref15 proposes xQuAD for example * framework proportionality aims at covering topics proportionally to the user interest, ref 9 (Dang and Croft) for example ### Characterizing genres * what characterizes a genre * following limitations (hierarchy of meaning, unbalanced distribution, overlap between genres, ...) * dataset Netflix: 100M ratings (1 to 5), 480.000 users, around 18000 movies; genres extracted from IMDB => info on 9300 movies (meaning 83% of the ratings) ### Measuring genre diversity in recommendation lists * a diversity measure should capture genre coverage (covering a maximum of genres, proportionally to user interest) * redundancy (important that items in the list cover a genre but also that other items do not cover this genre) * size-awareness (the previous two should take into account the size of the rec list, e.g. if the list is short only most important genres) * limitations of the literature: Ziegler's ILS, ref5's MMR are pairwise notions which are not well suited to evaluate notions such as a genre generality * intent-aware frameworks (refs 2 and 15) do not fully account for the idea that it is important that items do not cover a genre represented in the list, assumes that genres are independent from each other * ref9 (Dang and Croft) use the notion of proportionality to the user interest but do not penalize redundancy * no existing method take the length of the list into account ### Binomial framework for genre diversity * general principle: random distribution is considered as reference for optimal => model likelihood for a genre to randomly appear in a list according to a binomial distribution ##### Binomial diversity metric * selection of an item from a genre is seen as a Bernoulli test * n.b.: theoretically selection without replacement, practically nearly equivalent to selection with replacement * formal definitions: item i covers genre G(i) ; k_g^s = number of success on set s that item has genre g ; p_g" is proportion of interactions of a user with genre g (local importance) ; p_g' is proportion of interactions of all users with genre g (global importance) ; p_g = (1-alpha).p_g' + alpha.p_g" is the expected probability of a genre g to be in rec list R * coverage score: product of the probabilities for the genres not represented in R not to be selected randomly following the Bernoulli process (eq9) * non-redundancy score: measures how probable it is that a genre appears at least k times in R (so it's a kind of remaining tolerance) (eq10) * binomial diversity = coverage . non-redundancy * BinomDiv has appropriate properties: maximizes coverage as a function of p_g, penalizes over-representation of genres, adapts to the list length with the number of tests to do to create R ##### Binomial re-ranking algorithm * greedy re-ranking to optimize a trade-off function between relevance and diversity (eq13), parametrized by lambda ##### Qualitative analysis * results in Table 3: see how various diversity metrics behave in 4 different specific ranking situations ; principal conclusion is that BinomDiv is the only one which works all for all these situations * results in Table 4: (item based kNN + reranking) ; we observe the qualitative results of the reranking, depending on the user tastes ### Experiments * Two experiments with two datasets: Netflix prize + imdb genres (83M ratings, 480K users, 9300 movies, 28 genres) * MovieLens 1M ##### Setup * 5-fold cross-validation * RS rank all movies above a given threshold (grade) for the user considered + 1000 random movies of the dataset * RS tested: item-based CF kNN ; CF implicit Matrix Factorization ; item popularity ; random * reranking optimization is done with a grid search on lambda (trade-off diversity/relevance parameter) * diversity evaluation with all index in the literature (EILD, ERR-IA, CPR) + subtopic recall + subtopic per item * relevance evaluation with nDCG ##### Results for baseline diversity * Tab5: résults without diversification reranking (reminder: alpha reflects personalization degree) * random: very low relevance ; strong diversity * popularity: better relevance ; weaker diversity * personalized RS tend to have weaker non-personalized diversity scores but improve when the user history is taken into account ##### Results for diversified results * Tab6: résults after reranking, cutoff 20 items ; alpha =0.5 ; best lambda found with grid search * all diversifications => accuracy decreases * any diversification process is best when diversity evaluation is realized with it * xQuAD and ERR-IA tend to accumulate genres without penalizing redundancy * ERR-IA and CPR-rel correlated to SPI (subtopics per item) * Fig3: view improvement to baseline * BinomDiv can improve to baseline for nearly every diversity metric * general conclusion: BinomDiv able to bring more coverage while limiting redundancy * Tab7: explores size-awareness by changing cut-off value, diversification relative to lambda always best with the corresponding size [?]

Alt-Tab at 2019-08-23 09:52:44
Edited by Alt-Tab at 2019-08-23 12:45:53

Papers^γ

Comments:

Please consider to register or login to comment on the paper.

Papersγ

Comments:

Please consider to register or login to comment on the paper.

Papers^γ