# Short summary
- The authors propose a Recommender System which is a hybrid between collaborative filtering and content-based recommendation
- Content-based recommendation is essentially based on a HIN structure, with various kinds of content nodes (Figure 2 shows an explicit example)
- In the broad lines, their approach consist in making a random walk on the HIN structure
- More precisely, this is a RW with restart (which allows to personalize the results), and this is a Vertex Reinforced Random Walk, i.e. a specific kind of RW where future transitions to a node are more probable if this node has been visited in the past
- VRRW are not Markovian processes, and this translates into the fact that the transition matrix should be updated
- There is a lot of technical details about the implementation, which could be useful to someone who want to apply the method practically
- The learning process is achieved through Stochastic Gradient Descent
- They evaluate the efficiency of their approach (Div-HeteRec) on two Meetup datasets: one in Bangalore, another in Hyderabad (general stats in Tab.1)
- Performances are compared to RWR, CF-NMF and two versions of their method with less parameters (Uni-HeteRec, without personalization ; Learn-HeteRec, with a static transition matrix)
- Results in terms of accuracy (precision, recall and NDCG @k, with k=1,2,3,5,10) are compiled in Tables 2 to 5 for two different reco problems: group-to-user and tag-to-group
- Div-HeteRec performs better for group-to-user recommendation (perf better on Hyderabad dataset than on Bangalore dataset)
- Learn-HeteRec performs better for tag-to-group recommendation
- They provide hypothetical justifications to their observations
# Short summary
- The authors propose a Recommender System which is a hybrid between collaborative filtering and content-based recommendation
- Content-based recommendation is essentially based on a HIN structure, with various kinds of content nodes (Figure 2 shows an explicit example)
- In the broad lines, their approach consist in making a random walk on the HIN structure
- More precisely, this is a RW with restart (which allows to personalize the results), and this is a Vertex Reinforced Random Walk, i.e. a specific kind of RW where future transitions to a node are more probable if this node has been visited in the past
- VRRW are not Markovian processes, and this translates into the fact that the transition matrix should be updated
- There is a lot of technical details about the implementation, which could be useful to someone who want to apply the method practically
- The learning process is achieved through Stochastic Gradient Descent
- They evaluate the efficiency of their approach (Div-HeteRec) on two Meetup datasets: one in Bangalore, another in Hyderabad (general stats in Tab.1)
- Performances are compared to RWR, CF-NMF and two versions of their method with less parameters (Uni-HeteRec, without personalization ; Learn-HeteRec, with a static transition matrix)
- Results in terms of accuracy (precision, recall and NDCG @k, with k=1,2,3,5,10) are compiled in Tables 2 to 5 for two different reco problems: group-to-user and tag-to-group
- Div-HeteRec performs better for group-to-user recommendation (perf better on Hyderabad dataset than on Bangalore dataset)
- Learn-HeteRec performs better for tag-to-group recommendation
- They provide hypothetical justifications to their observations
Very good article from GroupLens team about the diversity narrowing effect and the role of the recommender system. Experimental measurements on the MovieLens platform, the content of movies is described with tag-genome, the RS is an item-item collaborative filtering.
### Introduction
##### two research questions :
- Do recommender systems expose users to narrower content over time?
- How does the experience of users who take recommendations differ from that of users who do not regularly take recommendations?
##### method
- users in two categories (following / ignoring ~ control group)
- then sort of A/B testing to measure consumption diversity and enjoyment at the individual level
##### 4 contributions :
- method to analyze the effects of RS on users
- give quantitative proofs that users using RS have a better experience than others
- show that diversity reduction effect is small
- show that users using RS tend to consume more divers contents
### Related work
- Pariser (11) on filter bubble
- Tetlock (16) measures bias induce by bubbles (among economists)
- Sunstein (15) thinks that personnalization tends to reduce the space of shared experience among users
- Negroponte (MIT MediaLab co-founder) defends that algorithms may also open horizons
- Linden (Amazon RS contributer, 5) thinks that reco can generate some form of serendipity and oppose to Pariser's thesis
- Fleder et al (2) use simulation to show that RS tend to uniform experience / Hosanagar et al (3) use measurements, but limitations to these studies, e.g. simplistic model of human behavior (Fleder)
### Data & metrics
##### Dataset MovieLens (September 2013) :
- 220.000 users
- 20M ratings, 20000 movies
- RS (item-item Collaborating Filtering, similar to Ama) propose "top picks" per user (15 default value)
##### Tag-genome to describe movie content
- information space which describes movies with tags given by users
- 9500 movies in tag-genome (april 2013) and 1100 tags => 10-11M pairs
##### Time period
- 21 months from Feb 2008 to Aug 2010 (because less missing data)
##### preprocessing :
- 15 first ratings taken out (platform propositions)
- then first 3 months taken out (as many ratings are "catching up" + to give users time to get used to ML and ML to get info from the user
##### recommendation blocks definition :
- several possibilities: per login session, per periods of time, but as users don't have the same activity
- => blocks of 10 consecutive ratings, 10 is roughly the median of number of ratings per 3 months
##### users and items :
- only users who have their first rating during the period of interest and who have at least 3 rating blocks => 1400 users with 3 to 200 rating blocks
- 173,000 ratings on 10500 movies, all in tag genome (small inconsistency: there is only 9500 movies in the tag-genome
##### Identifying consumed recommendations in a rating block
- groups based on the number of reco followed by users
- criterion to consider that reco followed: in the list of top picks between 3h and 3 months before evaluation
##### Ignoring group vs following group
- 2 groups of users
- pb: some users take a lot of reco during a period and then very few
- => rank users depending on the proportion of rating blocks during which they followed a reco (Fig4)
- then if >50% : following group (286 users); if 0% : ignoring group (430 users)
(seems quite ad hoc, but makes sense as differences are not very large and following reco must be quite rare)
##### Measuring content diversity: tag genome (see figure 5)
- matrix movie-tag with 1 to 5 relevance score
- movie is then a vector of scores
- distance between movies is euclidean distance in this space (rather than cosine similarity because of matrix density)
- order of magnitudes: min distance 5.1 (Halloween 4 - Halloween 5), max distance 44.2 (Matrix - Paris was a woman), mean 23.4
- authors defend that tag-genome is very expressive (more than cosine similarity), also benefits from the continuous input from users (illustrate on examples)
##### Measuring the effect of recommender systems
###### # content diversity metrics (standard)
- average pairwise distance of the movies in the list
- maximum pairwise distance of the movies in the list
- on recommended movies (15 top picks)
- on rated movies
- more or less normally distributed (see fig6)
###### # user experience :
- average rating per user
- more or less normally distributed
###### # analysis :
- mean shift for users in both groups
- measure difference between groups and within group at different times
- within group : use standard t-test (as same size)
- between groups : use Welch t-tests (as different sizes)
### Results
##### Research Question 1: do RS expose users to narrower content over time ?
- diversity on recommended movies: see tab2
- statistically significant drop on following group
- significant drop (?) on ignoring group
- following more diverse than ignoring (significant), but gap narrows over time
##### Research Question 2: does the experience of users who take recommendations differ from that of users who do not regularly take recommendations ?
- diversity on rated movies using mean distance: see tab3
- beginning: no significant difference
- end: significant drop for both groups
- diversity on rated movies using max distance: see tab4
- same trend
- enjoyment evaluated with rating between first and last block for both groups: see tab5
- following group gives higher ratings, and the rating drop is lower in following than in ignoring group
- similar trend with rating means (see tab7)
- refined analysis in tab6 with ratings depending on the fact that movies rated are recommended or not => better experience if movie recommended
(complement)
- rank together all blocks of all groups at all times per increasing average rating
- => block located in this ranking with its percentile (ex: first block, ignoring group ~ 63rd percentile)
- in tab8: percentile drop for both groups => percentile drop for ignoring is high (-19) not in the following (-1)
### Discussion
##### summary
- diversity tends to narrow in anyway over time
- effect subdued for the group which follows reco
- users following reco get more diverse recommendations
- ratings seem to encourage RS to broaden recommendations
##### prospects
- is there a natural trend to narrow consumption diversity?
- they think that item-item CF slows this effect down
- the RS can also inform the user of his or her consumption diversity
- finally RS can be designed to intentionally force diversity
##### limitations
- restriction to top picks
- restriction to one dataset and one system (item-item CF)
(note: on current platform (May 2019), 4 different RS: peasant/bard/warrior/wizard, the one described in the article seems to be warrior, default RS)
Very good article from GroupLens team about the diversity narrowing effect and the role of the recommender system. Experimental measurements on the MovieLens platform, the content of movies is described with tag-genome, the RS is an item-item collaborative filtering.
### Introduction
##### two research questions :
- Do recommender systems expose users to narrower content over time?
- How does the experience of users who take recommendations differ from that of users who do not regularly take recommendations?
##### method
- users in two categories (following / ignoring ~ control group)
- then sort of A/B testing to measure consumption diversity and enjoyment at the individual level
##### 4 contributions :
- method to analyze the effects of RS on users
- give quantitative proofs that users using RS have a better experience than others
- show that diversity reduction effect is small
- show that users using RS tend to consume more divers contents
### Related work
- Pariser (11) on filter bubble
- Tetlock (16) measures bias induce by bubbles (among economists)
- Sunstein (15) thinks that personnalization tends to reduce the space of shared experience among users
- Negroponte (MIT MediaLab co-founder) defends that algorithms may also open horizons
- Linden (Amazon RS contributer, 5) thinks that reco can generate some form of serendipity and oppose to Pariser's thesis
- Fleder et al (2) use simulation to show that RS tend to uniform experience / Hosanagar et al (3) use measurements, but limitations to these studies, e.g. simplistic model of human behavior (Fleder)
### Data & metrics
##### Dataset MovieLens (September 2013) :
- 220.000 users
- 20M ratings, 20000 movies
- RS (item-item Collaborating Filtering, similar to Ama) propose "top picks" per user (15 default value)
##### Tag-genome to describe movie content
- information space which describes movies with tags given by users
- 9500 movies in tag-genome (april 2013) and 1100 tags => 10-11M pairs
##### Time period
- 21 months from Feb 2008 to Aug 2010 (because less missing data)
##### preprocessing :
- 15 first ratings taken out (platform propositions)
- then first 3 months taken out (as many ratings are "catching up" + to give users time to get used to ML and ML to get info from the user
##### recommendation blocks definition :
- several possibilities: per login session, per periods of time, but as users don't have the same activity
- => blocks of 10 consecutive ratings, 10 is roughly the median of number of ratings per 3 months
##### users and items :
- only users who have their first rating during the period of interest and who have at least 3 rating blocks => 1400 users with 3 to 200 rating blocks
- 173,000 ratings on 10500 movies, all in tag genome (small inconsistency: there is only 9500 movies in the tag-genome
##### Identifying consumed recommendations in a rating block
- groups based on the number of reco followed by users
- criterion to consider that reco followed: in the list of top picks between 3h and 3 months before evaluation
##### Ignoring group vs following group
- 2 groups of users
- pb: some users take a lot of reco during a period and then very few
- => rank users depending on the proportion of rating blocks during which they followed a reco (Fig4)
- then if >50% : following group (286 users); if 0% : ignoring group (430 users)
(seems quite ad hoc, but makes sense as differences are not very large and following reco must be quite rare)
##### Measuring content diversity: tag genome (see figure 5)
- matrix movie-tag with 1 to 5 relevance score
- movie is then a vector of scores
- distance between movies is euclidean distance in this space (rather than cosine similarity because of matrix density)
- order of magnitudes: min distance 5.1 (Halloween 4 - Halloween 5), max distance 44.2 (Matrix - Paris was a woman), mean 23.4
- authors defend that tag-genome is very expressive (more than cosine similarity), also benefits from the continuous input from users (illustrate on examples)
##### Measuring the effect of recommender systems
###### # content diversity metrics (standard)
- average pairwise distance of the movies in the list
- maximum pairwise distance of the movies in the list
- on recommended movies (15 top picks)
- on rated movies
- more or less normally distributed (see fig6)
###### # user experience :
- average rating per user
- more or less normally distributed
###### # analysis :
- mean shift for users in both groups
- measure difference between groups and within group at different times
- within group : use standard t-test (as same size)
- between groups : use Welch t-tests (as different sizes)
### Results
##### Research Question 1: do RS expose users to narrower content over time ?
- diversity on recommended movies: see tab2
- statistically significant drop on following group
- significant drop (?) on ignoring group
- following more diverse than ignoring (significant), but gap narrows over time
##### Research Question 2: does the experience of users who take recommendations differ from that of users who do not regularly take recommendations ?
- diversity on rated movies using mean distance: see tab3
- beginning: no significant difference
- end: significant drop for both groups
- diversity on rated movies using max distance: see tab4
- same trend
- enjoyment evaluated with rating between first and last block for both groups: see tab5
- following group gives higher ratings, and the rating drop is lower in following than in ignoring group
- similar trend with rating means (see tab7)
- refined analysis in tab6 with ratings depending on the fact that movies rated are recommended or not => better experience if movie recommended
(complement)
- rank together all blocks of all groups at all times per increasing average rating
- => block located in this ranking with its percentile (ex: first block, ignoring group ~ 63rd percentile)
- in tab8: percentile drop for both groups => percentile drop for ignoring is high (-19) not in the following (-1)
### Discussion
##### summary
- diversity tends to narrow in anyway over time
- effect subdued for the group which follows reco
- users following reco get more diverse recommendations
- ratings seem to encourage RS to broaden recommendations
##### prospects
- is there a natural trend to narrow consumption diversity?
- they think that item-item CF slows this effect down
- the RS can also inform the user of his or her consumption diversity
- finally RS can be designed to intentionally force diversity
##### limitations
- restriction to top picks
- restriction to one dataset and one system (item-item CF)
(note: on current platform (May 2019), 4 different RS: peasant/bard/warrior/wizard, the one described in the article seems to be warrior, default RS)
Very interesting article.
I'd like to get their data on the "Top Picks for you".
There's probably a correlation between movie releases and what users watch and what pop into Top Picks for you, source of (minor?) bias.
They use one metric for content diversity, there's room to explore other ones. They use Ziegler's work on intra list diversity ([21]).
Very interesting article.
I'd like to get their data on the "Top Picks for you".
There's probably a correlation between movie releases and what users watch and what pop into Top Picks for you, source of (minor?) bias.
They use one metric for content diversity, there's room to explore other ones. They use Ziegler's work on intra list diversity ([21]).
Question asked: how can you audit a recommender system if you don't have access to user-item interactions?
Answer proposed: use a "recommendation graph" and simulate user behavior with a random walk with teleportation.
Legitimate question, I am not completely convinced by the answer but it has some merits, in particular, simplicity.
### Introduction
- to measure : diversity / segregation-polarization (not accuracy)
- use the structure of an underlying network
- 3 datasets, all movies RS : IMDB, Google Play, Netflix ; films have a genre in a list
##### Contributions : quantifying diversity
- use a notion of graph: directed link from i to j if j is recommended on i page
- measures based on films genre ; popularity based on the graph structure: in-degree, PageRank
- segregation quantified with concentration and evenness (8: Chakraborti et al, 2017)
### Related work
##### diversity in RS
- 9 : Nguyen et al, WWW 2014: CF impact on user behaviors
- 10: Zhou et al, PNAS 2010 (Auralist) defines novelty
- 11: Santini and Castells defines diversity et novelty with a fuzzy interpretation
- 12: Vargas et Castells, RecSys 2011: novelty and diversity from a user perspective
- 13: Lathia et al, SIGIR 2010: considers temporal aspects
##### on polarization dynamics
- 20: DeGroot explanation from opinion modeling
- 17: Dandekar et al, PNAS 2013 : polarizing effects of RS algorithms
- two main schools of thought to explain polarization: either opinion is reinforced by interactions with like-minded people or people are exposed to opposite views then reject them (hence polarization
### Framework for auditing RS
##### Network construction : see above
- possible to give weights to the links in tne network depending on item-item similarities, or based on the rank of j reco on i page
- but overall different with literature: directed network and not based on similarity
##### User modeling : see above (RW with teleport), as we don't have access to navigation logs
- then we consider distribution of types visited during the RW
##### Datasets : see above
- general stats : see tab1
- collect by snowball sampling (from one node, then neighbors, then neighbors of neighbors... like a BFS)
- to account for personalization effects, crawling from a unique IP adress
- film genres: 15 for GP, 29 for Netflix and IMDB
- compare distributions for 400 steps RW with no teleport (so, sort of random sampling): GP dominated by Action, IMDB more balanced
### Diversity in RS
##### existing measures
- similarity between items computed with Jaccard coefficient
=> possible to compute usual measures (ILS (15), long-tail novelty (14), unexpectedness (23), source-list diversity(16))
- Table 2 : traditional measures on the datasets
- observation : Netflix has greater source-list diversity, but IMDB more diverse according to other measures
##### new measures
- assortativity by genre, by popularity (using as a proxy normalized centralities: in-degree, pagerank) => see Table III
- contingency matrices: fraction of links from genre to genre (fig2); in general RS recommend from a genre to the same one + some specific relation between specific genres
- equivalent for popularity: bins of popularity (bottom/middle/top) then count links from a bin to another (fig3); in general push toward long-tail, especially on IMDB
##### New measures based on RW
- entropy of the genre distribution obtained
- exploration when parameters vary (N = RW length, tp = teleportation probability, starting point)
- fig4a : evol with t_p (small increase then plateau)
- fig4b : évol with N (growth)
- note that the RW is finite and we do not reach the steady state
##### Information segregation in RS
* use measures in Chakraborty et al.(8) : evenness and concentration
* evenness : to what extent a group is exposed uniformly to info units , it (1-Gini) with Gini computed on the genres consumed by the users of the group (1 ~ even consumption)
* concentration : 1/2 . sum (fraction of films i consumed by the group).(fraction of the films consumed by the groups).(fraction of the films of genre i) ; if concentration is low what is consumed is close to what is proposed
* results on fig5
Question asked: how can you audit a recommender system if you don't have access to user-item interactions?
Answer proposed: use a "recommendation graph" and simulate user behavior with a random walk with teleportation.
Legitimate question, I am not completely convinced by the answer but it has some merits, in particular, simplicity.
### Introduction
- to measure : diversity / segregation-polarization (not accuracy)
- use the structure of an underlying network
- 3 datasets, all movies RS : IMDB, Google Play, Netflix ; films have a genre in a list
##### Contributions : quantifying diversity
- use a notion of graph: directed link from i to j if j is recommended on i page
- measures based on films genre ; popularity based on the graph structure: in-degree, PageRank
- segregation quantified with concentration and evenness (8: Chakraborti et al, 2017)
### Related work
##### diversity in RS
- 9 : Nguyen et al, WWW 2014: CF impact on user behaviors
- 10: Zhou et al, PNAS 2010 (Auralist) defines novelty
- 11: Santini and Castells defines diversity et novelty with a fuzzy interpretation
- 12: Vargas et Castells, RecSys 2011: novelty and diversity from a user perspective
- 13: Lathia et al, SIGIR 2010: considers temporal aspects
##### on polarization dynamics
- 20: DeGroot explanation from opinion modeling
- 17: Dandekar et al, PNAS 2013 : polarizing effects of RS algorithms
- two main schools of thought to explain polarization: either opinion is reinforced by interactions with like-minded people or people are exposed to opposite views then reject them (hence polarization
### Framework for auditing RS
##### Network construction : see above
- possible to give weights to the links in tne network depending on item-item similarities, or based on the rank of j reco on i page
- but overall different with literature: directed network and not based on similarity
##### User modeling : see above (RW with teleport), as we don't have access to navigation logs
- then we consider distribution of types visited during the RW
##### Datasets : see above
- general stats : see tab1
- collect by snowball sampling (from one node, then neighbors, then neighbors of neighbors... like a BFS)
- to account for personalization effects, crawling from a unique IP adress
- film genres: 15 for GP, 29 for Netflix and IMDB
- compare distributions for 400 steps RW with no teleport (so, sort of random sampling): GP dominated by Action, IMDB more balanced
### Diversity in RS
##### existing measures
- similarity between items computed with Jaccard coefficient
=> possible to compute usual measures (ILS (15), long-tail novelty (14), unexpectedness (23), source-list diversity(16))
- Table 2 : traditional measures on the datasets
- observation : Netflix has greater source-list diversity, but IMDB more diverse according to other measures
##### new measures
- assortativity by genre, by popularity (using as a proxy normalized centralities: in-degree, pagerank) => see Table III
- contingency matrices: fraction of links from genre to genre (fig2); in general RS recommend from a genre to the same one + some specific relation between specific genres
- equivalent for popularity: bins of popularity (bottom/middle/top) then count links from a bin to another (fig3); in general push toward long-tail, especially on IMDB
##### New measures based on RW
- entropy of the genre distribution obtained
- exploration when parameters vary (N = RW length, tp = teleportation probability, starting point)
- fig4a : evol with t_p (small increase then plateau)
- fig4b : évol with N (growth)
- note that the RW is finite and we do not reach the steady state
##### Information segregation in RS
* use measures in Chakraborty et al.(8) : evenness and concentration
* evenness : to what extent a group is exposed uniformly to info units , it (1-Gini) with Gini computed on the genres consumed by the users of the group (1 ~ even consumption)
* concentration : 1/2 . sum (fraction of films i consumed by the group).(fraction of the films consumed by the groups).(fraction of the films of genre i) ; if concentration is low what is consumed is close to what is proposed
* results on fig5
Comments: